“What’s new?” is an interesting and broadening eternal question, but one which, if pursued exclusively, results only in an endless parade of trivia and fashion, the silt of tomorrow. I would like, instead, to be concerned with the question “What is best?”, a question which cuts deeply rather than broadly, a question whose answers tend to move the silt downstream.
– Robert M Persig, Zen and the Art of Motorcycle Maintenance
The key to understanding Data Architecture comes from answering three questions: what, how and why.
- What is this data?
- Where did it come from?
- Why is it usable?
The first two define a basic usage level and give users confidence. The third is a confidence multiplier and leads us to unexpected results.
The IT Leader's Guide to Multicloud Readiness
This guide provides practical key insights and important factors to consider to make informed decisions in your multicloud journey.
Download the Guide
Traditionally, Systems Architecture has been evaluated in terms of quality, reliability, interoperability and many other factors (the ity’s) that are each expanded and elaborated in detail, resulting in a system that meets many stakeholder needs. Most of the published standards are related to structure, syntax, design patterns and principals. A good architecture helps us, but doesn’t actually provide users with the answers in a data-centric world. Data Architecture leverages the system architecture and answers the questions: What, how and ultimately why?
We could use the following six levels to measure data usability:
- Level -1: No Trust – Not much to say here. Well…you’ve got to go for it because there is no other choice.
- Level 0: Cynical Satisfaction – Users trust the data enough to use it, but are tempered by a cynicism related to their effort to verify, transform and create something useful for the business. The mistrust comes from being expressly told the data has quality, but during its use some quality or definition problems are discovered.
- Level 1: Basic Promises Fulfilled – Using a detailed data dictionary and glossary, users develop a basic understanding of the data. The system provides an integrated glossary that can be queried and is tightly integrated with its output; users can easily navigate between data and its definition. The glossary provides links to internal and industry specific terminology, labels, aliases and disambiguation.
- Level 2: Data Fits Usage – It’s good because its source is known to have quality. Lineage provides users the ability to align data with practices and routines of its usage. Lineage provides not only the trace backward to the source data, but also the transformation logic and reference data used. In addition, the data contains both source and final data states and is easily verified. This data is integrated into reports used internally, intra-organization, to make decisions.
- Level 3: No Black Boxes – No data in a large organization is perfect, and to get it into a usable state, manual processes (black-boxes) are used to calculate or fix data. This level provides the glossaries and lineage for off-line processes. The offline lineage includes all people, resources, data and controls used in creating the data. This level is characterized by data that has not encountered unforeseen problems, and therefore the quality and satisfaction are highly correlated and users are comfortable sharing the data externally to regulators where incorrectness will have negative consequences.
- Level 4: Data delights – The data goes well beyond its expectations; it provides unexpected and surprising insight to the markets or the organization. The user feels the system is elegant and enables them to ask questions related to why, not just what and how.
The delighted user will say the system understands and anticipates my needs, the supplier helps with my problems and can be trusted to advise on new opportunities and ventures.
It is the Data Architect’s role to ensure systems answer: What is this data? Where did it come from? Why is it usable?