Skip to main content

Cloud

Default Dimension Members and Identity Columns

When populating dimension table records, I use Integer Identity columns to generate meaningless surrogate primary keys. I usually also create default “catch-all” records for “No Value Specified” or “Invalid Value Specified”


In other words let’s say my source data for a Gender Dimension looks like this:

Male

Female

<null>

Male

3


I would want to have a dimension table that looks like this:

-1

Invalid

0

No Gender

1

Male

2

Female


The way I usually accomplish this with SSIS is with a Execute SQL Task immediately before the Data Flow task:

The Execute SQL Task will have code like this:

SET IDENTITY_INSERT dbo.dim_Gender ON

INSERT dbo.dim_Gender (GenderID, GenderDescription) VALUES (0, ‘No Gender’)

SET IDENTITY_INSERT dbo.dim_Gender OFF

which injects the default record.

Then we pull directly from the source column, usually with a SELECT DISTINCT, and whatever translation or cleansing code is necessary in the Data Flow task.

An interesting thing was happening in a recent test: The results we were getting were:

0

No Gender

2

Male

3

Female

In this case we were only inserting the one default record; we were not inserting a default “Invalid” dimension record, so I would have expected to see:

0

No Gender

1

Male

2

Female


Two points:

1. These are meaningless surrogate keys. We never refer to these values anywhere, so technically it shouldn’t matter what the actual Primary Key values are

2. There are other ways to accomplish this type of operation, for example with a Derived Column or with a Union All SSIS DataFlow object

Nevertheless I decided I wanted to make the dimension record key values populate as I expected them to. The issue is with how SQL Server manages Identity values.

There are some good blog entries that cover this in greater detail here and here:

This is the code I ultimately wound up using was:

SET IDENTITY_INSERT dbo.dim_Gender ON

INSERT dbo.dim_Gender (GenderID, GenderDescription) VALUES (0, ‘No Gender’)

SET IDENTITY_INSERT dbo.dim_Gender OFF

GO

DECLARE @nextident INT

SELECT @nextident=MAX(IDENTITYCOL) FROM dim_Gender

DBCC CHECKIDENT (dim_Gender, RESEED, @nextident )

This last 3 lines force SQL to reseed the Identity Value. This code makes several assumptions:

1. That I have exclusive use of the table – there are not other people inserting/deleting simultaneously as I am running

2. That my incremental value is 1, and we are incrementing, not decrementing (Did you know that you can decrement an Identity column by specifying a negative incremental value?)

This accomplishes what I want, but strikes me as a bit clunky. Have you done this or even better, have you come up with a cleaner approach?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

PointBridge Blogs

More from this Author

Follow Us