Skip to main content

Cloud

SSIS typical task premier

SSIS is a new feature of SQL Server 2005.

It’s the next generation of the former DTS packages and has many similarities but at the same time have a lot of differences. In general, SSIS is more flexible, efficient and intuitive.

Although Microsoft has many good examples of SSIS packages, it could be difficult for the first time SSIS user to figure out how to implement some typical techniques.

Usually a data transformation task is the starting point of a big project, the same task can be implemented in many different ways and typically a developer doesn’t have a lot of time for this task. At the same time, the task is not considered complex and challenging, so there is always a temptation to do the task using less advanced but more familiar ways. For example, one can usually create an SQL queries or stored procedure to import the data or can even develop a small custom application to do it. I would advise against this, it is always better to spend some time learning a new technology in order to use it in the future.

Here are very typical requirements of populating a normalized database with data that will give you some idea of how to use SSIS.

The short description of the source and target databases, the task and the requirements.

1. The Target database has some primary-foreign key relations; this means that before inserting data in a table that reference another table (lookup table) the lookup table has to be populated.

2. Some of the data may already exist in the target database, so you always have to check if such an item already exists in the table, and if it does, don’t insert a new row, just update fields in the existing row. It means that you will have to do Upserts (update an item or insert a new item if it already exists). To read more about Upserts in SSIS check this web site. http://sqljunkies.com/WebLog/ashvinis/archive/2005/06/15/15829.aspx.

3. In case of an exception, log as much information as possible in the event log and proceed from the next row of data.

4. To make it simple the source database has the same structure as the target database but doesn’t have any constrains, so the data integrity is not guaranteed.

5. The target database has all necessary constrains it means that the inserted or updated data has to comply with the rules defined by the constrains.

6. The source and the target database schemas are shown on Figure 1 and Figure 2.

Figure 1. The source databse.

Figure 2. The target database.

Typical steps.

1. Start with creating a control flow. Control flow can be very complicated and can include loops and other logic but in this case it can be as simple as this (see Figure 3). The control flow contains just a sequence of data flows and each data flow populates one table. The sequence is important because you have to populate the lookup tables first. Most probably you will want to name the data flow according to the table names. In the target database we have Primary/Foreign key constrains, so we have to populate the lookup tables (Employees and Customers) before populating Orders table. The control flow is shown on Figure 2.

Figure 3. The control flow.

2. If you double click any of the boxes that represent a data flow on the control flow diagram you will be redirected to the data flow screen.

3. The dataflow is similar for Employees and Customers. For each data row from the source we check if the item (Customer for Customers table or Employee for Employees table) already exists in the target database and if it does exist, we update the fields of the existing item, if it doesn’t exist we insert the new row in the table. As I mentioned before this technique is called Upserts. The dataflow for the Customers and Employees tables are shown on Figure 4 and Figure 5.

Figure 4. Customers table data flow.

Figure 5. The Employees table data flow.

4. For the Orders table the dataflow is a little bit more complex since the Orders table references the Employees and the Customers tables. To insert a record in the Orders table we have to find primary keys of already populated records in the Customers and Employees tables first, otherwise the referential integrity rules enforced on the database level won’t allow us to insert the record. The dataflow for the Orders table is shown on Figure 6.

Figure 6. The Orders table data flow

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

PointBridge Blogs

More from this Author

Follow Us