When we approach a scenario related to database, commonly used SCD types. Even though Talend has an inbuilt component we mostly prefer not to use tDBSCD feature to improve performance. The common way to implement SCD is to normalize a job design instead of using a single component. Irrespective of technology people look for an effortless approach or straightforward methodology to implement any model/Job. Let’s implement the painless design in a minutes.
Introducing a component tAddCRCRow to achieve hashkey which helps to track the data to identify the changes. This component has an inbuilt column (CRC) which will create a hashkey based on the selected non-key columns.
Let’s take an example to understand better, consider the banking domain source as csv file and target as table. Here we are going to handle upsert by splitting the data flow based on target lookup.
Job Design
We can place the CRC component before comparing source and target data to generate the hashkey. In this component ,enable the checkbox (Use in CRC) of non key columns that creates the hash value in CRC column for every single record coming from source.
And then we are segregating the data using inner join by matching the key columns. One flow is to capture the rejects by enabling “catch lookup inner join reject” as true and insert into target table.
Another flow is to update the matched records and filter the changed records by comparing the Key column and CRC column. This filter is to avoid the unwanted update of non-changed records in this flow. Changes are updated in target table with new hash value.
Point to remember :- Need to define hashkey and audit columns in target table.
This approach will be more effective when we compare more number of non-key columns.
Great and helpful ! Sounds knowledgeable.