Skip to main content

Data & Intelligence

Change Data Capture – It’s all about timing !!

A few weeks ago, I got a gmail chat message from a friend telling me that my favorite sports team was having a great match and that i should definitely watch the match.  So I went to the sports portal to get the latest score and see what the team was doing. I found that the match was not that interesting and the scores were stagnant for some time.  Thinking that it might’ve been a prank from my friend, I ignored it.  To my dismay, the match score I saw was delayed by 10 minutes and the score wasn’t yet reflected in the sport’s portal. I suspect that an effective CDC implementation might have been the problem.

If we expand the CDC problem to stock markets and daily traders, the impact might be much higher.This is one of reasons we need more real-time data for various departments in companies which constantly trouble IT for more real time data. There are a lot of use-cases where batch data loads are acceptable, but to keep up with data volumes and data velocity, companies have to process more real time data. The trouble starts when people are just trying to handle the increase in volume and unfortunately miss the batch window for processing.

This is a 2 part series about change data capture detailing the different types of CDC, Use cases and the current product offerings.

Change Data capture – the old way

Earlier, it was a tedious task to capture delta changes from the database and capturing the staging data. Prior to the introduction of Change Data Capture, there were a number of ways that users could capture change data, including table differencing and change-value selection.

Table differencing is a method of copying an entire table from the source to staging database, where an older version of the table exists. It involves using the SQL MINUS operator to obtain the inserted and new versions of changed rows. This approach heavily increases the chances of network hogging.

Change-value selection is a method used in capturing data on the source database by selecting new and changed data based on a specific column. Usually the CREATED_DATE and MODIFIED_DATE columns in the tables. Changes are identified based on the values in these columns.

Data Intelligence - The Future of Big Data
The Future of Big Data

With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.

Get the Guide

These are some of the common techniques involved in capturing changes to the data. These solutions have their limitations and use too much of manual resources to setup and documenting the changes.

Change Data Capture – the new way

There are 2 ways Change Data Capture can capture and publish committed change data

Synchronous :

As each SQL DML (INSERT,UPDATE or DELETE) operation is executed in the database, we capture the changed data through triggers on the source database. In this method, change data is captured as part of the transaction modifying the source tables.

Asynchronous :

This is a non invasive procedure where we are able to take advantage of the data sent to the redo logs files. Changed data is captured after a SQL statement that performs a DML operation is committed. Change data is not captured as part of the transaction that is modifying the source table and therefore has no effect on that transaction.

In many ways the Asynchronous mode is more nonobstructive and has no bearing on the transaction database. Then you follow a publish and subscribe mechanism to handle data requests from different applications.

Advantage of using CDC over traditional data capturing techniques.

  • Eliminate batch windows because the changes are captured automatically and delivered round the clock.
  • Deliver data in more real-time to the business.
  • In data integration, the risk of impacting existing applications is eliminated, because it’s noninvasive.
  • Decrease MIPS usage on a mainframe compared to other integration techniques, which gives more headroom for growth on the same hardware.
  • Average customers decrease MIPS costs by 30%.

I have worked for many companies who don’t have an enterprise wide change data capture strategy. This might seem like a small issue for many, since CDC problems are not evident immediately, but grows slowly over a period of time with the data volume and data velocity. The IT departments would be wise to recognize the needs and respond accordingly.

Part 2 in the series would cover capabilities, use cases and current product offerings for CDC.

Jim Miller

Mr. Miller is an IBM certified and accomplished Senior Project Leader and Application/System Architect-Developer with over 30 years of extensive applications and system design and development experience. His current role is National FPM Practice Leader. His experience includes BI, Web architecture & design, systems analysis, GUI design and testing, Database modeling and systems analysis, design, and development of Client/Server, Web and Mainframe applications and systems utilizing: Applix TM1 (including TM1 rules, TI, TM1Web and Planning Manager), dynaSight - ArcPlan, ASP, DHTML, XML, IIS, MS Visual Basic and VBA, Visual Studio, PERL, Websuite, MS SQL Server, ORACLE, SYBASE SQL Server, etc. His Responsibilities have included all aspects of Windows and SQL solution development and design including: analysis; GUI (and Web site) design; data modeling; table, screen/form and script development; SQL (and remote stored procedures and triggers) development and testing; test preparation and management and training of programming staff. Other experience includes development of ETL infrastructure such as data transfer automation between mainframe (DB2, Lawson, Great Plains, etc.) systems and client/server SQL server and Web based applications and integration of enterprise applications and data sources. In addition, Mr. Miller has acted as Internet Applications Development Manager responsible for the design, development, QA and delivery of multiple Web Sites including online trading applications, warehouse process control and scheduling systems and administrative and control applications. Mr. Miller also was responsible for the design, development and administration of a Web based financial reporting system for a 450 million dollar organization, reporting directly to the CFO and his executive team. Mr. Miller has also been responsible for managing and directing multiple resources in various management roles including project and team leader, lead developer and applications development director. Specialties Include: Cognos/TM1 Design and Development, Cognos Planning, IBM SPSS and Modeler, OLAP, Visual Basic, SQL Server, Forecasting and Planning; International Application Development, Business Intelligence, Project Development. IBM Certified Developer - Cognos TM1 (perfect score 100% on exam) IBM Certified Business Analyst - Cognos TM1

More from this Author

Follow Us