When SQL Server 2008 was released, two of the features added were Change Data Capture and Change Tracking. Both features are essentially designed to allow users to query a database and determine what data has changed. However, they go about it in two very different ways. Some of the differences are obvious while others are much more subtle.
The purpose of this article is to compare and contrast the two features and help the reader understand the pros and cons of each approach so that they may make an informed decision. This article isn’t meant to replace the MSDN section on these features. I highly recommend reading about these two features on MSDN. It is great for understanding how to implement each feature and how to take advantage of it, and while they do have a page devoted to explaining the differences, I feel like they do not emphasize some of the differences as much as they should.
Tip: I will call out information that I feel is important like this. This means that the information is not found in the section of MSDN that I linked previously or that I feel it was not emphasized enough by MSDN.
Change Tracking (CT)
Change Tracking is described by MSDN as “light weight”, and while that is certainly true in some aspects, I do not feel it is entirely honest. The primary example given for using CT is if you need to synchronize with other databases that are occasionally online (think about an application on a tablet or laptop). In order to use CT you must enable it on the database level, but that does not actually begin tracking changes. You must enable it for each table you want to track changes.
Tip #1: CT is a synchronous process. If a table that is being monitored by CT has its data changed (Insert, Update, or Delete), the information about this change is recorded while it is happening, this will cause a slight delay in processing the aforementioned Insert, Update, or Delete.
When you enable CT on the database level, you also configure the retention period. This is the period of time that SQL Server will keep information about data changes. Whatever you are using CT for, you should make sure that you have not passed the retention period since the last time you’ve checked for changes. In order to access data about the changes, you use the CHANGETABLE function in your query.
You can gather the following information with CT:
- Which rows have been Inserted
- Which rows have been Updated
- Which rows have been Deleted
- (optional) Which column values have changed (but not the actual values)*
The last one is optional because when you enable CT on a table, you also indicate if you want it to keep track of column changes. It also only applies to Updates (not Inserts or Deletes).
Tip #2: CT considers a column as having been changed if a value was assigned to the given column in the UPDATE statement (even if the value being assigned to the column is the same value already in that column).
CT should be thought of as a feature that aggregates changes. This means that if a row has been updated five times since you last checked, querying for changes will only show that the row has been updated, there isn’t any information on how many times. You can get this information, but it is a very manual and brute force process. Another good example is if a row has been updated once and then deleted since you last checked, then CT will report the row as deleted. If getting a true history of all the changes is important, then Change Data Capture is most likely better suited for your needs.
Change Data Capture (CDC)
Change Data Capture is the more serious of the two options. It is able to do more than CT can. While CT is available in the Standard Edition of SQL Server, CDC is not.
CDC also needs to be enabled on the database level before it can be used. CDC needs to be enabled on a per-table basis as well. By default all columns are involved in CDC, but that is configurable.
You can gather the following information with CDC:
- Which rows have been Inserted
- The values of the tracked columns at the time of Insert
- Which rows have been Updated
- The values both before and after of the tracked columns at the time of the Update
- Which rows have been Deleted
- The values of the tracked columns at the time of the Delete
- The time at which the operation happened
Tip #3: A column is only considered to have changed if the value itself has changed (this pertains to Updates only). If you set a column to the same value it already contains, CDC does NOT consider that the data has changed. This means that if the values you set in an UPDATE operation are all the same as the current value, then there is no record of the UPDATE in CDC.
Tip #4: Remember, CDC can be configured to track a subset of columns in a table. Anything that happens to columns CDC is not configured to track is completely ignored.
Unlike CT, CDC is asynchronous. This means that when a table’s data is changing, CDC is not involved, it does not slow down any transactions. CDC detects changes by reading the transaction log. When CDC is being utilized, there is a job created in SQL Server Agent. When the job runs, it reads the logs and records the changes. This also means that when you query for changes, you may not be getting all of the changes as there is a delay between when the data is changed and when the data is available in CDC.
Tip #5: Asynchronous does NOT mean free! There is a common misconception that if something is asynchronous, that it is “free”, meaning it doesn’t take any resources. This clearly is not true; all it means is that when data is changed, the session that caused the change does not have to wait for CDC to record the change, that will happen at a later time.
CDC creates physical tables. These are the tables you query to determine which values have changed. In these tables, there is a single row created for each INSERT and DELETE operation. However, two rows are created for each UPDATE operation. One row describes the data before the update, and another row describes the data after the UPDATE. Like CT, there is a retention period (default is 3 days). There is another SQL Server Agent job that will remove rows from the CDC tables that are older than the configured retention period.
Hopefully this article has been helpful in articulating some of the important, but not well-documented, differences between CDC and CT.