Have you ever been responsible for ETL and your end user(s) report that the numbers are not right? Do you have a methodology or a process to go back and verify the numbers are what they are? When you are building ETL, are you writing audits logs that capture metrics that are important to the business? Investing some time upfront can save hours, days, and/or weeks on traceability of tracking issues. It can make your job much easier when there are issues with reporting accuracy. Let’s take a look at where and what you should track to ensure what goes into ETL is what is coming out.
Soup to nuts, the above is what an Enterprise ETL system could look like. Being an Architect of designing systems for Fortune 500 companies for over 18 years, I can assure you this is not everything and there could be more or less to your implementation. The key in the above diagram is recording the source and destination metrics of the different logical steps that happen in the transformation of enterprise data each step of the way. Examples of the types of metrics you can record based on the ETL Batch slice are:
- Record counts
- Aggregation of metric counts
- Fact table dimensional attributes
- Custom table to record business processes
Having a methodology of defending your ETL collateral is a must when reporting is critical to the business and there are claims the numbers are inaccurate. I hope the above has given you food for thought if you are not recording what is going into ETL and what is coming out. Check out my other post for an example of reporting that you can implement to visualize how the source and destinations can match.
Andrew works in the Microsoft BI practice for Duane Schafer and is currently creating solutions for the healthcare and software industries.