Being able to trust data is of course important for decision makers. We may even say we have trusted data. Even if the data is trusted by decision makers – how certain are you that it should be trusted? If we slap a label on something and call it “trusted” data without paying the true price to ensure it is trusted data- we are setting ourselves up for failure. For example, many enterprise data initiatives (e.g., data warehouses, MDM hubs, ODS’s) have failed due to lack of trust in the data. Some common factors leading to these failures in trust include IT driven initiatives (if you build it they will come), inattention to quality, and poor data modeling and understanding of the business.
I will identify some data management activities critical to enabling real trust in the data. There are other activities necessary for data trust pertaining to the presentation layer, infrastructure, and security that I won’t address here. The data management activities I will highlight are: Data Governance, Data Architecture & Modeling, Data Profiling (as part of data quality efforts), and Metadata Management.
Data Governance & Stewardship – data has been called the bridge between IT and the business, and so the business must be heavily involved in ensuring that data is well defined. Token participation by the business is unacceptable. A Data Governance Board and Data Stewardship organizations are key vehicles for ensuring business participation and alignment.
Data Architecture & Modeling – Sufficient thought and planning must be put into enterprise data initiatives. The architectural components of a data initiative must be documented in detailed, for example in a System Architecture Document (SAD). Data Modeling is of particular importance – the business should be modeled before the solution is! There are many semantic disconnects and business rules which must be uncovered and resolved (with business participation…) – a Conceptual Data Model is the ideal vehicle for modeling the business from a data requirements perspective. Metadata about data elements must be captured (either in the data model or in a business glossary) beyond a simple definition – e.g., valid values, acceptable formats, data steward identification, metadata about compliance (e.g., HIPAA) requirements, synonyms, etc.
Data Profiling – Source data AND the target data store must be profiled to ensure alignment with the metadata captured in the models and as an aid to assisting in model development, e.g., identification of natural keys, acceptable values, relationships, and dependencies. Data Profiling is too often considered something to be done when time permits – or something to do manually. A Data Profiling tool can provide sophisticated analysis much more accurately and efficiently than manual analysis (e.g., writing SQL’s). Ideally, a formal Data Quality program will be in place to measure and improve data quality over time.
Metadata Management – Another activity often overlooked – but which has serious implications for being able to trust data – is to be able to capture, integrate, and present metadata to data stewards, analysts, developers, and even end users. Trusted data means data which we understand and which conforms to our understanding and requirements – we need to know what it means, what acceptable values are and what these values mean, where data comes from and how has it been transformed, what the degree of quality is, etc. A wide variety of metadata is needed to enable trusted data – and this metadata must be managed and should be used to automate some quality processes and to aid in analysis and knowledge retention.
As I mentioned before, there are many aspects to be able to provide trusted data. From a data management perspective, I believe Data Governance and Stewardship, Data Architecture & Modeling, Data Profiling, and Metadata Management will provide outstanding capabilities to provide “trusted data.”