Data Lake Participants – Roles and Responsibilities - Perficient Blogs
Blog
  • Topics
  • Industries
  • Partners

Explore

Topics

Industries

Partners

Data Lake Participants – Roles and Responsibilities

As you may recall, in my last blog I introduced the analogy of the Aggregator to describe utilizing a Data Lake as a Consolidator of information, and I mentioned the three key roles in this model: the Supplier, the Aggregator and the Consumer.

In this post I will provide a little more detail on the responsibilities possessed by each of these roles that, when carried out diligently, provide an effective environment for obtaining significant value from the Lake.

For this model to work effectively – there are a few key points to keep in mind at all times:

  • The Supplier has no direct knowledge of the Consumer’s needs or how they want the items presented – that is the role of the Aggregator
  • The Consumer is unaware of the Supplier, only knows what is available by interacting with the Aggregator
  • The Aggregator is driven by an understanding of the Consumer, both in knowing what they need (or may need in the future), as well as how they need to see or access it, therefore, it is the Aggregator that decides how to present items to the Consumer

Keeping these underlying principles in mind, the following set of responsibilities can be defined for each role (note that the embedded examples are for a Healthcare Insurance Provider):

Supplier

  • Provides a full description of what is being delivered to the Data Lake
    • A Conceptual and Logical Model of the information in the “language” of the standard catalog that has been adopted by the Data Lake as representative of the enterprise’s business information – independent of any physical implementation
    • A set of any rules that have been placed upon the information (e.g. this source system only allows one Address per Person)
    • The set of “calculations” being provided, along with a formula of how that calculation is made – using the concepts as defined in the enterprise catalog (e.g. a count of Group Members is the sum of all Plan Members, both the Group Member, i.e. the Subscriber, as well as all the Plan Members identified on each Contract held by the Subscriber)
    • The set of “views” that are represented in the supplied information and the criteria used to generate the content of the view (e.g. all contracts of subscribers that are age 65 or over and are male)
  • Provides a full description of how the information is being delivered to the Data Lake
    • The form (extract file, acquisition service, direct connection “pipe”, etc.)
    • The detailed format within the form that maps back to the “what” documentation presented above

Note that no transformation requirements are provided because, as a supplier, it is not its responsibility

Consumer

  • Provides a full description of what is being requested of the Data Lake
  • A Conceptual and Logical Model of the information in the “language” of the standard catalog that has been adopted by the Data Lake as representative of the enterprise’s business information – independent of any physical implementation
  • A set of any rules that are followed by the target, so the delivered information needs to abide accordingly (e.g. this target system only allows one Benefit Package per Division)
  • The set of “calculations” needed by the target, along with a formula of how that calculation is made – using the concepts as defined in the enterprise catalog (e.g. a count of Group Members is the sum of all Plan Members, both the Group Member, i.e. the Subscriber, as well as all the Plan Members identified on each Contract held by the Subscriber)
  • The set of “views” that are needed to be provided in the supplied information and the criteria that defines the view content (e.g. all contracts for an HMO product where the subscriber is female and resident in the state of Arkansas)
  • Provides a full description of how the information is desired from the Data Lake (this is highly negotiable as the Data Lake may offer alternative delivery mechanisms or may reject the Consumer’s request)
  • The form (extract file, acquisition service, direct connection “pipe”, etc.)
  • The detailed format within the form that maps back to the “what” documentation presented above
  • If transformations needed from what the Data Lake has agreed to make available, a description of the transformation desired

Note that in this model, even other “consolidators” (such as a Data Warehouse or Operational Data Store) are also Consumers, therefore have the same responsibility

Aggregator

  • Ensures there are suppliers with the items the consumers’ need
  • Taking delivery from a supplier, in whatever format that takes, and presenting these items to the consumer
  • Provide the common vocabulary (catalog) of the information currently or “aspirationally” resident in the Data Lake (this may expand as Suppliers come on board with new concepts or Consumers make requests for new concepts)
    • A Conceptual and Logical Model
    • A set of any rules that have been placed upon the information
    • The set of “calculations” available
    • The set of “views” available
  • Provides a full description of how the information can be accessed by a Consumer and the physical mapping for where the information may be found
  • Determines the best approach for moving Supplier information to Consumer accessible information (by using its knowledge of the needs of the consumer and how it wishes to serve the consumer)
  • Provides assistance for both Suppliers and Consumers in representing their information utilizing the common vocabulary
  • Provides guidance and assistance to Consumers in actually obtaining the information from the Data Lake

Governs all the information resident in the Data Lake

This last statement is key to the connection to Information Governance. As a matter of fact, all these responsibility descriptions are an aspect of the “decision rights” defined and controlled by a Governance Body.

The implication being that the “keepers” of the Data Lake must establish the Governance of the information housed in the lake – although it is recommended that the IG Program be created organizationally as a separate and distinct entity from the Data Lake solution owner.

You will also notice that a lynchpin between all these roles is a Catalog that is utilized by all parties in their communications with the other roles. The creation and maintenance of this catalog is the responsibility of the IG Program – and I will talk more about this artifact, and its importance, in my next post.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to the Weekly Blog Digest:

Sign Up