Data lineage is the capture of the flow of data from the source through intermediary systems and data transformations to a final destination or consumer. Having good data lineage provides a means to confirm that data used by consumers is from trusted, authoritative sources with adequate controls in place to govern the hand-offs between systems.
Lineage capture and visualization can be executed at different levels to serve different needs. Lineage process flow diagrams only provide a view of specific business processes and their control
points, but do not provide details on systems or data transformations.
Lineage system flow diagrams provide a view of the flow between systems supporting a business function or data delivery. Additional details can be added at the transfer points between producer and consumer systems, and include data class information and transfer point details, but there may still be gaps in details related to data transformation and attribute details.
Attribute-level lineage can be further enhanced to include transformation details which capture the conversion, alteration or combination of attributes that occur as part of the transition, but can come with a high cost depending on the lineage capture methodology employed. For this reason, it may make sense to capture attribute level lineage with transformations for critical data attributes only, rather than extending to all attributes for all interfaces.
Critical data attributes are a subset of the universe of data attributes that are defined by the data councils to be most important, whether due to their business value, regulatory requirements, risk exposure, or organizational impact.
The addition of data transformation details provides a powerful tool for data quality analysts to use in determining the root cause of incorrect data in reporting or target systems. Data lineage can be used in change management as a tool to understand the current state and assess impacts of planned changes for both technological change and business process change.
Lineage, combined with authoritative sources, can be used to identify opportunities for platform simplification.
We recently published a guide that explores the building blocks (i.e., data governance components) of data governance, which can help drive better business decisions, enhance regulatory compliance, and improve risk management. You can download it here.