In my last blog, you may recall that we were discussing the value and the need for Standards and Templates for ensuring a consistent and efficient use of the Data Lake, both in its population (supplying) and in its retrieval (consuming) of information. To achieve this level of consistency and efficiency, as well as reliability, requires a robust Information Governance Program responsible for overseeing the environment. In this entry, I will provide an overview of what this means to me.
As I’ve referenced in previous blog entries, Information Governance can be defined as a strategic practice that defines Rules (inclusive of policies, guidelines, laws, etc.) for interacting with Information, Decision Rights and Responsibilities of all parties involved in these interactions and the Processes and Controls to be followed when performing these interactions. To accomplish this, the IG Practice itself fulfills a set of oversight roles that can be compared to our (the U.S.) form of government consisting of three branches – Executive, Legislative and Judicial.
|Executive||Provides overall strategy and guidance to the Program and how it serves (and benefits) the organization. Identifies and approves the needed Artifacts (Rules, Decision Rights, Processes)||Governance Committee/Board, Steering/Strategic Committee, etc.|
|Legislative||Creates, maintains and improves the artifacts at the behest of the Executive Branch; communicates and describes the artifacts to the enterprise||Governance SME-based Workgroups, Governance Analysts, etc.|
|Judicial||Enforces artifacts and identifies needs (along with the entire user community) for the creation, modification or removal of artifacts||Information Stewards, Owners, etc.|
As far as Rules, Decision Rights and Processes, we need to consider the overall purpose and role of a Data Lake and craft these accordingly. If you accept that the Data Lake will house the Information Assets of the enterprise, the following are some examples of these artifacts consistent with that model.
As indicated, this is a broad category meant to capture the “enforceable” items with regard to the use of the Data Lake. Some “categories” of these rules include:
- What is Contained: Specific guidance as to the information that is to be resident in the Lake – equally important is any information specifically excluded from the Lake
- Who has Access: Provides guidance on roles and expectations and controls role assignments for individuals interacting with the Lake – this includes both the users as well as Governance personnel
- How to Interact: Guidance around acceptable behavior in all aspects of interacting with the Lake, from supplying, consuming and governing the information resident in the Lake
Decision Rights bestow enforceable privileges (and the associated responsibility) upon parties involved in the program. These rights need to be defined for all governance and user roles. Using the Aggregator analogy we have been talking about, the following are examples of the Decision Rights bestowed upon the Supplier, Consumer and Aggregator.
- Decide the format of the information they are providing
- Decide what information they are supplying
- Decide when and at what cadence if applicable, information will be provided
- Decide what information they are willing to accept
- Decide what format and delivery mechanism they require
- Decide when and at what cadence if applicable, information will be obtained
- Decide what information will be resident in the Lake
- Decide what formats of information that will be accepted from a Supplier and provided to a Consumer
- Decide when and at what cadence they will accept information from a supplier and deliver information to a Consumer
These decision rights may appear “dictatorial” and at cross-purposes, but that is not the case. The expectation is that the decisions be highly collaborative between the parties, but that, ultimately, each party has the right to make a decision best suited for them.
Processes essentially define how and when the Rules and Decision Rights are utilized along a path of activities put in place to achieve a usage goal of the Data Lake. These Processes again must be defined for both governing the information as well as how the user interactions are to take place. Some Processes that would be defined by the IG Program include:
- Request Management: Processes for making a request for a governance artifact to the Governance Program – inclusive of how the request is handled and tracked
- Artifact Development/Maintenance: Processes around the creation and modifications made to governance artifacts – inclusive of the deployment of these artifacts
- Artifact Enforcement: Processes around how artifacts will be monitored for adherence – inclusive of activities for dealing with non-compliance
- Supply Information: Processes that manage the interaction between a supplier and the aggregator
- Consume Information: Processes that manage the interaction between a consumer and the aggregator
As you can see, there is a lot of “infrastructure” that needs to be put in place for the effective and efficient use of a Data Lake. If the enterprise recognizes that it is worth this investment to ensure the enterprise a valuable and reliable Data Lake.
The establishment and maintenance of this infrastructure is the duty and responsibility of an Information Governance practice area – which is why I consider IG an essential aspect of any Data Lake initiative.
In my next post I will provide some key takeaways to keep in mind when creating the business case for the establishment of an Information Governance Program for getting the most out of a Data Lake.