Recommendable stuff...

Wednesday, March 28, 2007

Dynamically modifying an SSIS 2005 package

There may be a time when you need to create a package with a data flow that needs to be modified at run-time. This means that at design time, you don't create any column mappings, or maybe a few standard columns like a 'CreateDate' or 'UpdateDate' columns, but the actual data columns are not known at design time.

This requirement creates some problems like:
1. SQL Server doesn't have the necessary columns to store the data.
2. To create columns in SQL, we need the data type and length (if applies) for each column.
3. SSIS 2005 user scripts can't be used to modify the SSIS in which it is contained.
and a couple of more issues that i can't remember anymore... haha...

Well one approach will be to write an application/service that will:
1. Update SQL table with new/modified tables
2. Load the package and create sql table mappings.
3. Execute the package, if required.

We'll take a look at each of these steps in more detail.
1. Update SQL table with new/modified tables
1.1
To update SQL table, we need a the column definitions ie column name, datatype, length. This can be obtained from the source table and written to a csv file or read directly from the source database's INFORMATION_SCHEMA.COLUMNS view. Here's a sample code:
select COLUMN_NAME, DATA_TYPE, NUMERIC_PRECISION, NUMERIC_SCALE from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = 'Client'
Note: if you're integrating data between different databases eg. UniVerse -> SQL, Oracle -> SQL, SQL -> Oracle, etc..., you may need a function that will 'map' the different datatype between the database types.


1.2 Once you have the column definitions, you can make the connection to the SQL database (I'm assuming its sql, you can do it with other databases as well) and create your database table's columns.
Note: You need to import the following namespace in your application to be able to do the database manipulation:
Microsoft.SqlServer.Management.Smo()
Microsoft.SqlServer.Management.Common()
Code example to acquire a connection to the database in your program:


Dim db As Microsoft.SqlServer.Management.Smo.Database
'Get connection settings
Dim DBServerName, DBUser, DBPassword, DBName As String
DBServerName = "MyServer"
DBUser = "sa"
DBPassword = "sa_password"
DBName = "MyDatabase"
Dim SrvConn As New Microsoft.SqlServer.Management.Common.ServerConnection(DBServerName)
SrvConn.LoginSecure = False
SrvConn.Login = DBUser
SrvConn.Password = DBPassword
Dim srv As New Server(SrvConn)
'Open Required Database
db = srv.Databases(MyDatabase)


To get to the table you want to edit, use the db.Tables(tableName as string) function


1.3 Write code to add the columns
Dim tb As Table = db.Tables(tableName)
Dim newDataType As SqlDataType
newDataType = DataType.VarChar(intDataTypeLength)
Dim newColumn As New Column(tableName, columnName, newDataType)
tb.Columns.Add(newColumn)
Your SQL table is now ready for mapping.


2. Now the more difficult part.
Import the following namespaces:
Microsoft.SqlServer.Dts.Runtime.Wrapper
Microsoft.SqlServer.Dts.Runtime
Microsoft.SqlServer.Dts.Pipeline.Wrapper


2.1 Load the package (i'm assuming its on filesystem)
Dim app As New Microsoft.SqlServer.Dts.Runtime.Application
Dim pkg As Microsoft.SqlServer.Dts.Runtime.Package
pkg = app.LoadPackage("C:\PackageToEdit.dtsx", Nothing)


2.2 Now that you have loaded the package, we now need to 'navigate' to the component that is the dataflow component for the table you're going to modify.
To modify a dataflow, we have to start updating the names from the 'top' of the flow and end at the 'bottom' of the flow. This is what's called as the 'pipeline' by Microsoft.
Here's a basic sequence of modification we'll make.
a. Update the source connection manager with column names
b. Update the data flow source component
c. Update the data flow destination component
We'll go into detail now...


2.2.a Load the source connection manager.
You should know the name of your connection manager during design time. Its always good to use a naming convention for your components so they can be parameter driven if necessary.
What you need at this stage:
i) a variable / array containing the column names to be updated
ii) the source connection manager name as used in the SSIS Designer
iii) the package variable (we already loaded earlier)
Steps:
i. Get the connection manager object from the list of package's connection managers. I'm using an example of a flat file connection manager.
Dim SrcConn As ConnectionManager = Nothing
'Get specific connection
SrcConn = pkg.Connections.Item(ConnectionName)
'Get the underlying connection object
Dim FlatConnMgr As Microsoft.SqlServer.Dts.Runtime.Wrapper.IDTSConnectionManagerFlatFile90
FlatConnMgr = CType(SrcConn.InnerObject, Microsoft.SqlServer.Dts.Runtime.Wrapper.IDTSConnectionManagerFlatFile90)
ii. Now we create a new column and add to the connection manager's columns collection
Dim NewCol As RTW.IDTSConnectionManagerFlatFileColumn90 = nothing
FlatConnMgr.Columns.Add(NewCol)
'Assign the new column's properties
NewCol.ColumnType = "Delimited"
NewCol.ColumnDelimiter = "~"
NewCol.ColumnWidth = 1
NewCol.DataType = Wrapper.DataType.DT_STR
NewCol.MaximumWidth = 1
NewCol.DataScale = 0
NewCol.DataPrecision = 0
name = CType(NewCol, RTW.IDTSName90)
name.Name =
name.Description =
NOTE : Data type incompatibility. Looks easy? Here's a problem.

SSIS datatypes and SQL data types are NOT the same. You need to do some 'mapping' between the SSIS data type and SQL datatype. We already know what the column's SQL data type is, because we used it to create the SQL column, remember? So we just need to have a function that can do some 'mapping' for us. For your convenience, i'll include what i use, and you can modify if necessary.
Public Function MapSQLtoSSISDataType(ByVal SQLDT As SqlDataType) As Dts.Runtime.DataType
Select Case SQLDT.SqlDataType
Case SqlDataType.BigInt
Return Wrapper.DataType.DT_I8
Case SqlDataType.Binary
Return Wrapper.DataType.DT_BYTES
Case SqlDataType.Bit
Return Wrapper.DataType.DT_BOOL
Case SqlDataType.Char
Return Wrapper.DataType.DT_STR
Case SqlDataType.DateTime
Return Wrapper.DataType.DT_DBTIMESTAMP
Case SqlDataType.Decimal
Return Wrapper.DataType.DT_DECIMAL
Case SqlDataType.Float
Return Wrapper.DataType.DT_NUMERIC
Case SqlDataType.Image
Return Wrapper.DataType.DT_BYTES
Case SqlDataType.Int
Return Wrapper.DataType.DT_I4
Case SqlDataType.Money
Return Wrapper.DataType.DT_CY
Case SqlDataType.NChar
Return Wrapper.DataType.DT_WSTR
Case SqlDataType.None
Return Wrapper.DataType.DT_STR
Case SqlDataType.NText
Return Wrapper.DataType.DT_NTEXT
Case SqlDataType.NVarChar
Return Wrapper.DataType.DT_STR
Case SqlDataType.NVarCharMax
Return Wrapper.DataType.DT_STR
Case SqlDataType.Real
Return Wrapper.DataType.DT_DECIMAL
Case SqlDataType.SmallDateTime
Return Wrapper.DataType.DT_DBTIMESTAMP
Case SqlDataType.SmallInt
Return Wrapper.DataType.DT_I2
Case SqlDataType.SmallMoney
Return Wrapper.DataType.DT_CY
Case SqlDataType.SysName
Return Wrapper.DataType.DT_STR
Case SqlDataType.Text
Return Wrapper.DataType.DT_TEXT
Case SqlDataType.Timestamp
Return Wrapper.DataType.DT_DBTIME
Case SqlDataType.UniqueIdentifier
Return Wrapper.DataType.DT_GUID
Case SqlDataType.VarChar
Return Wrapper.DataType.DT_STR
Case SqlDataType.VarCharMax
Return Wrapper.DataType.DT_STR
Case SqlDataType.Xml
Return Wrapper.DataType.DT_STR
Case Else
Return Wrapper.DataType.DT_STR
End Select
End Function
You can use this function to return the appropriate SSIS data type from the SQL data type you have on the database. Not perfect but you can use it as a base.


iii. Once you've done the above for all your new columns, your Source Connection Manager will now have the necessary columns to be accessed downstream in the data flow.

2.2.b Update the Data Flow Components
Information you need at this point:
i) The name of the data flow component to update
ii) All the names of the data flow components you want to update as used in the SSIS Designer.
iii) list of new columns to add.
Steps:
i) Get the data flow component object.
Dim DataFlowTaskHost As TaskHost
Dim DataFlowMainPipe As MainPipe
DataFlowTaskHost = CType(pkg.Executables(), TaskHost)
DataFlowMainPipe = CType(DataFlowTaskHost.InnerObject, MainPipe)
We now have the main pipeline of the data flow (whatever you see in the 'Data Flow' in the SSIS designer)


ii) Now get the data SOURCE object (top of the data flow). I'm assuming this data source component is set up to use the data source connection manager we already updated.
Dim DFSource As IDTSComponentMetaData90
DFSource = DataFlowMainPipe.ComponentMetaDataCollection.Item(SourceComponentName)
'Create an Instance of the source component
Dim instDfSource As CManagedComponentWrapper = DFSource.Instantiate
Now we 're-assign' the connection manager for this component so that it retrieves the new values that was updated on the connection manager earlier.
DFSource.RuntimeConnectionCollection(0).ConnectionManagerID = pkg.Connections.Item().ID
DFSource.RuntimeConnectionCollection(0).ConnectionManager = DtsConvert.ToConnectionManager90(pkg.Connections(SourceComponentConnectionMgrName))
instDfSource.AcquireConnections(Nothing)
instDfSource.ReinitializeMetaData()
instDfSource.ReleaseConnections()
Note: the reinitalizeMetaData step is very important as it updates the internal/external metadata to be seen by other components.


iii) Now we need to update the data DESTINATION object. In this example, i'm assuming its an OLEDB Insert object.
As for the source component, we need to get hold of the component's object
Dim cmDestination As IDTSComponentMetaData90
cmDestination = DataFlowMainPipe.ComponentMetaDataCollection.Item(DataFlow Insert Dest. Name)
Dim instDFDest As CManagedComponentWrapper = cmDestination.Instantiate

Reassign the connection manager
cmDestination.RuntimeConnectionCollection(0).ConnectionManagerID = pkg.Connections.Item(DestinationComponent ConnectionManager Name).ID
cmDestination.RuntimeConnectionCollection(0).ConnectionManager = DtsConvert.ToConnectionManager90(pkg.Connections(DestinationComponent ConnectionManager Name))
instDFDest.AcquireConnections(Nothing)
instDFDest.ReinitializeMetaData()
Get the input object of the component so that we can add the new columns.
Dim input As IDTSInput90 = cmDestination.InputCollection(0)
Dim vInput As IDTSVirtualInput90 = input.GetVirtualInput()
We need to map each column of the input object's to the external metadata columns of the 'upstream' component, which is our source. This will make the Insert component aware of the new columns.
For Each vColumn As IDTSVirtualInputColumn90 In vInput.VirtualInputColumnCollection
Dim vCol As IDTSInputColumn90 = instDFDest.SetUsageType(input.ID, vInput, vColumn.LineageID, DTSUsageType.UT_READWRITE)
instDFDest.MapInputColumn(input.ID, vCol.ID, input.ExternalMetadataColumnCollection(vColumn.Name).ID)
Next
to finalise the changes we need to reinitialise the metadata.
instDFDest.ReinitializeMetaData()
instDFDest.ReleaseConnections()

The 'ReinitialiseMetaData' will actually check the columns against the SQL table (using its assigned connection manager) and report any errors here.
Done! you've got your components updated with the custom data.
If you'd like to see the changes, save your package using the app.SaveToXml function and you can open it in the editor.
This is a 'rough' guide and you'll need to add and fine tune it to your purposes.
Let me know if you found it helpful.

5 ppl said stuff:

A very helpful guide. Thanks for posting.

is there a chance to upload your fully working example to here.

thanks

Ergin

I was looking for a method how to override column delimiter in flat file dynamically. I was digging an internet couple of days and found nothing. This really helped me lot. Thanks!!!

Hi,

Could you please uplaod your solution?

Thanks

Dos

thanks. very helpful..
sankalp

Get the new version of Firefox!