Mark Steinbacher, Author at Perficient Blogs https://blogs.perficient.com/author/msteinbacher/ Expert Digital Insights Mon, 12 Nov 2018 17:39:08 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png Mark Steinbacher, Author at Perficient Blogs https://blogs.perficient.com/author/msteinbacher/ 32 32 30508587 Data Lake and Information Governance – The Key Takeaways https://blogs.perficient.com/2018/11/12/data-lakes-information-governance-key-takeaways/ https://blogs.perficient.com/2018/11/12/data-lakes-information-governance-key-takeaways/#respond Mon, 12 Nov 2018 17:40:57 +0000 https://blogs.perficient.com/?p=228358

A Data Lake can be a highly valuable asset to any enterprise, and there is a myriad of technology solutions available for leveraging the processes to feed, maintain and retrieve information from the Lake.

But all this technology is, if not worthless, significantly less valuable, if the environment is not well governed and managed. This is the primary Takeaway to keep in mind when a Data Lake solution is being considered – or is already in place but needing improvement – by any organization.

Another takeaway is the idea of positioning the Data Lake as an Aggregator of information – and for it to operate analogically like a Warehouse store – positioned to serve Consumers, but ultimately is responsible for determining how best to collect, store, and make available, the information it houses. This takeaway significantly influences how the Governance of the environment is set up and run.

Accepting the above two statements – the criticality of Governance and the Operating Model of an Aggregator – some other observations can be made:

The Supplier

  • Needn’t have knowledge of the Consumer(s) as they work directly and exclusively with the Aggregator
  • Needs to be willing to conform to the formats, mechanisms and timings of information delivery as defined (through negotiations as necessary) by the Aggregator
  • Needs to be able to describe the information they supply in a “common language” that focuses upon “what” the information is, regardless of how or where it is represented

The Consumer

  • Needn’t have knowledge of the Supplier(s) as they work directly and exclusively with the Aggregator
  • Needs to be willing to conform to the formats, mechanisms and timings of information delivery as defined (through negotiations as necessary) by the Aggregator
  • Needs to be able to describe the information they require in a “common language” that focuses upon “what” the information is, regardless of how or through what mechanism it is delivered

The Aggregator

  • Is the “lynchpin” between Suppliers and Consumers, therefore is responsible for ensuring Consumer satisfaction through appropriate “sourcing” (supplier systems) to address the needs of all Consumers
  • As the central repository for the information transferred between suppliers and consumers, the Aggregator is keeper of the “common language” referred to in the Supplier and Consumer observations. This may take the form of a Master Information Catalog, a Semantic or Canonical Model, a Business Glossary of Terms or any combination thereof
  • Guides both Suppliers and Consumers through the defined interaction processes and the use of the standards and templates defined for aiding these interactions

Governance

  • Defines and ensures all parties adhere to the Rules, Rights and Processes for the use and management of the Data Lake
  • Identifies and defines all standards and templates needed to ensure the consistency, efficiency and effectiveness of the interactions
  • Governance is the ultimate and final authority for negotiating the relationships, duties, rights, obligations and privileges of all parties (Suppliers, Consumers and Aggregator)

As mentioned in a previous entry, these observations may sound dictatorial, but for this to be successful, when it comes to the information assets housed in the Data Lake, a highly collaborative environment where all parties are willing to compromise and reach consensus must be an integral part of the culture of the enterprise.

So, this completes my journey into Data Lakes and the Information Governance needed. I hope you found this interesting and helpful. Feel free to reach out with any comments or observations you may have. Thanks so much for reading my blog.

]]>
https://blogs.perficient.com/2018/11/12/data-lakes-information-governance-key-takeaways/feed/ 0 228358
Information Governance – Essential Ingredient for Business Value https://blogs.perficient.com/2018/09/11/information-governance-essential-ingredient-business-value/ https://blogs.perficient.com/2018/09/11/information-governance-essential-ingredient-business-value/#comments Tue, 11 Sep 2018 18:06:29 +0000 https://blogs.perficient.com/?p=228351

In my last blog, you may recall that we were discussing the value and the need for Standards and Templates for ensuring a consistent and efficient use of the Data Lake, both in its population (supplying) and in its retrieval (consuming) of information. To achieve this level of consistency and efficiency, as well as reliability, requires a robust Information Governance Program responsible for overseeing the environment. In this entry, I will provide an overview of what this means to me.

As I’ve referenced in previous blog entries, Information Governance can be defined as a strategic practice that defines Rules (inclusive of policies, guidelines, laws, etc.) for interacting with Information, Decision Rights and Responsibilities of all parties involved in these interactions and the Processes and Controls to be followed when performing these interactions. To accomplish this, the IG Practice itself fulfills a set of oversight roles that can be compared to our (the U.S.) form of government consisting of three branches – Executive, Legislative and Judicial.

Branch Description Fulfilled By
Executive Provides overall strategy and guidance to the Program and how it serves (and benefits) the organization. Identifies and approves the needed Artifacts (Rules, Decision Rights, Processes) Governance Committee/Board, Steering/Strategic Committee, etc.
Legislative Creates, maintains and improves the artifacts at the behest of the Executive Branch; communicates and describes the artifacts to the enterprise Governance SME-based Workgroups, Governance Analysts, etc.
Judicial Enforces artifacts and identifies needs (along with the entire user community) for the creation, modification or removal of artifacts Information Stewards, Owners, etc.

As far as Rules, Decision Rights and Processes, we need to consider the overall purpose and role of a Data Lake and craft these accordingly. If you accept that the Data Lake will house the Information Assets of the enterprise, the following are some examples of these artifacts consistent with that model.

Rules

As indicated, this is a broad category meant to capture the “enforceable” items with regard to the use of the Data Lake. Some “categories” of these rules include:

  • What is Contained: Specific guidance as to the information that is to be resident in the Lake – equally important is any information specifically excluded from the Lake
  • Who has Access:   Provides guidance on roles and expectations and controls role assignments for individuals interacting with the Lake – this includes both the users as well as Governance personnel
  • How to Interact: Guidance around acceptable behavior in all aspects of interacting with the Lake, from supplying, consuming and governing the information resident in the Lake

Decision Rights

Decision Rights bestow enforceable privileges (and the associated responsibility) upon parties involved in the program. These rights need to be defined for all governance and user roles. Using the Aggregator analogy we have been talking about, the following are examples of the Decision Rights bestowed upon the Supplier, Consumer and Aggregator.

Supplier Rights

  • Decide the format of the information they are providing
  • Decide what information they are supplying
  • Decide when and at what cadence if applicable, information will be provided

Consumer Rights

  • Decide what information they are willing to accept
  • Decide what format and delivery mechanism they require
  • Decide when and at what cadence if applicable, information will be obtained

Aggregator Rights

  • Decide what information will be resident in the Lake
  • Decide what formats of information that will be accepted from a Supplier and provided to a Consumer
  • Decide when and at what cadence they will accept information from a supplier and deliver information to a Consumer

These decision rights may appear “dictatorial” and at cross-purposes, but that is not the case. The expectation is that the decisions be highly collaborative between the parties, but that, ultimately, each party has the right to make a decision best suited for them.

Processes

Processes essentially define how and when the Rules and Decision Rights are utilized along a path of activities put in place to achieve a usage goal of the Data Lake. These Processes again must be defined for both governing the information as well as how the user interactions are to take place. Some Processes that would be defined by the IG Program include:

  • Request Management: Processes for making a request for a governance artifact to the Governance Program – inclusive of how the request is handled and tracked
  • Artifact Development/Maintenance: Processes around the creation and modifications made to governance artifacts – inclusive of the deployment of these artifacts
  • Artifact Enforcement: Processes around how artifacts will be monitored for adherence – inclusive of activities for dealing with non-compliance
  • Supply Information: Processes that manage the interaction between a supplier and the aggregator
  • Consume Information: Processes that manage the interaction between a consumer and the aggregator

As you can see, there is a lot of “infrastructure” that needs to be put in place for the effective and efficient use of a Data Lake. If the enterprise recognizes that it is worth this investment to ensure the enterprise a valuable and reliable Data Lake.

The establishment and maintenance of this infrastructure is the duty and responsibility of an Information Governance practice area – which is why I consider IG an essential aspect of any Data Lake initiative.

In my next post I will provide some key takeaways to keep in mind when creating the business case for the establishment of an Information Governance Program for getting the most out of a Data Lake.

]]>
https://blogs.perficient.com/2018/09/11/information-governance-essential-ingredient-business-value/feed/ 2 228351
Working with the Data Lake Aggregator – Standards and Templates https://blogs.perficient.com/2018/08/07/working-data-lake-aggregator-standards-templates/ https://blogs.perficient.com/2018/08/07/working-data-lake-aggregator-standards-templates/#respond Tue, 07 Aug 2018 18:37:28 +0000 https://blogs.perficient.com/?p=228344

In my previous blog, I described the concept of an “Information Catalog” and how it plays a vital role in ensuring communication between the Data Lake Aggregator and Suppliers and Consumers is efficient and effective due to the common language that it provides.

I also included the following diagram as an example of how the Catalog is used to connect the artifacts built for describing the information assets:

I also mentioned that confusion can still reign if there are not standards in place to guide and control how to present the specifications, requirements and designs artifacts that are needed for these collaborations. This post will take a look at some artifacts typically generated by Suppliers and Consumers suggesting how these standards can be realized through the use of templates defined by the Aggregator – or more specifically the IG Program overseeing the Data Lake.

Supplier Artifacts

The supplier needs to communicate not only what is being supplied, but also how it is being supplied in sufficient detail so that the Aggregator can take the information, get it “landed” into the Lake, and then also be able to find the relevant information in what is provided to fulfill Consumers’ needs.

Using the example of a Supplier providing an “Extract File”, the following set of templates, or required artifacts, should be used to fully specify what is in the Extract File:

Semantic Model This represents the concepts, their characteristics and their relationships to one another. This is not so much a template as a set of standards for representing these aspects in a “boxes and lines” kind of view. These models must represent a subset of the Catalog’s Model (which may require an expansion of the Catalog if the Supplier is providing information not yet represented)
Glossary of Terms This glossary contains not only the Semantic Model items, but also other terms that may describe information being provided that is derived from the semantic model (for example, Calculated or Summary Values that are present in the extract file). This template contains a set of standard “columns” for describing a term (definition, synonyms, term categorization, etc.)
Rules This presents all constraints that the supplier’s system enforced on the information being provided. For example, if the model identifies a Person can have many Addresses, but the supplier system only allows one Address per Person – that would be documented in this Rulebook. Similar to the Glossary template, the rule template should contain typical “columns” for describing a rule
Translation Map This is the heart of the specification in that it “connects” the information being provided (in this case the extract file’s records and fields) to the concepts as represented in the Semantic Model and Glossary of Terms. This template therefore consists of columns that describe the record/field being supplied and matching set of columns that describe the concepts to which these items align, or map, as represented in the Model/Glossary
Field Definition Dictionary Similar to the Glossary, this presents a description of every field in the extract file. This template consists of a set of columns typical for describing a field, but should also, like the glossary, offer guidance as to what constitutes a good definition
Field Valid Values For any field that is limited by what can be placed in it within the supplying system, the full set of values that are valid. This template consists of a set of columns for describing a value including, in the case of ‘codes’ or other cryptic values, columns that allow for a full description of the meaning of each of these values

Consumer Artifacts

The Consumer needs to tell the Aggregator what they need, but needn’t, at least initially, worry about exactly how these needs are presented to them. This gives the Aggregator some flexibility in fulfilling the need which, in turn, will improve efficiency of delivery in that the Aggregator will be able to offer “standard” packages of information that may serve the needs of multiple Consumers.

Given that, the set of required artifacts for a Consumer focus upon simply describing what is needed:

 

Semantic Model As the artifact used by the Supplier, this represents the concepts, their characteristics and their relationships to one another. This is not so much a template as a set of standards for representing these aspects in a “boxes and lines” kind of view. These models must represent a subset of the Catalog’s Model (which may require an expansion of the Catalog if the Consumer is requesting information not yet represented)
Glossary of Terms This glossary contains not only the Semantic Model items, but also other terms that may describe information being requested that is derived from the semantic model (for example, Calculated or Summary Values that are needed by the Consumer). This template contains a set of standard “columns” for describing a term (definition, synonyms, term categorization, etc.)
Rules This presents all constraints that the consumer’s system will enforce on the information being provided. For example, if the model identifies a Person can have many Addresses, but the consumer system only allows one Address per Person – that would be documented in this Rulebook. Similar to the Glossary template, the rule template should contain typical “columns” for describing a rule

As you may have seen, the Consumer’s artifacts are identical to the Supplier’s as far as templates and content – the difference is strictly in the PERSPECTIVE from which these are populated. This furthers the ability of de-coupling sources from targets in that the Supplier need focus only on what they are providing and the Consumer can focus only on what they need.

This provides the Aggregator significant flexibility in both accepting information coming in as well as multiple ways for sending information out.

I realize I did not provide a lot of detail or specific examples of what a template would actually look like, but, to some degree, that is dependent upon a particular enterprise’s need and maturity. Hopefully this gives you sufficient information to get started on defining your own templates, but feel free to leave a comment or reach out directly to me if you’d like further information (or to add details of your own).

Finally, all this talk of Master Catalogs, Standards and Templates leads me to my ultimate area of interest for making all this work – Information Governance. For this all to come to fruition, and be sustainable, a robust Information Governance Program is required, and it is this I will discuss in my next post.

]]>
https://blogs.perficient.com/2018/08/07/working-data-lake-aggregator-standards-templates/feed/ 0 228344
Data Lake as Aggregator – The Critical Role of the Catalog https://blogs.perficient.com/2018/07/10/data-lake-aggregator-critical-role-catalog/ https://blogs.perficient.com/2018/07/10/data-lake-aggregator-critical-role-catalog/#respond Tue, 10 Jul 2018 18:34:45 +0000 https://blogs.perficient.com/?p=228336

My previous blog talked about a Data Lake using a Supplier-Aggregator-Consumer analogy and talking about the roles each of these parties play. One factor that is critical to the success of this approach is the use of a common vocabulary that ensures efficiency and effectiveness in the interactions and collaborations between the parties.

The implication of the Aggregator analogy is that suppliers and consumers independently approach the aggregator, so it is imperative that there is a common language utilized by all for describing what is provided (the “specifications” of the supplier’s content), what is needed/desired (the “requirements” of the consumers) and what is actually contained in the Data Lake (the “catalog” of information published by the aggregator).

So, what does this Catalog look like? Given this is information we are talking about, it is not anything you probably haven’t seen before – essentially it consists of a representation of the information housed in the Data Lake utilizing Information/Data Models and a Glossary of Terms. Together they fully describe the information that is relevant to the business being conducted by the enterprise.

Both the Models and the Glossary exclusively describe “what” information exists using the “language of the business” for which it exists. Both the terminology and the representation/notation used in the models must be accessible to all those involved – both business and technical – to ensure maximum understanding.

To be perfectly clear – what this is NOT is a physical representation of how and where all the information is stored, or its format, access mechanisms or any other physical aspect. Those are all critical and play a part in the actual receipt and delivery of information, but that “how” detail is addressed separately in order to keep the Catalog focused upon ensuring a common language that does not fluctuate with the use or advancement of technology.

The following diagram provides an example of how the Catalog serves as the “connecting thread” between what the supplier provides and the consumer needs:

This diagram illustrates the use of the Catalog not only for describing the information from both party’s perspectives, but how it is also used to ensure consistency and traceability of the physical instantiation of the information in the Lake with and to the common concepts represented in the Catalog.

All of this collaboration, even with a common language, can still be inefficient if every individual party is left to their own devices for presenting their specifications or requirements to the aggregator. The establishment of standards and templates can greatly reduce this inefficiency and I will discuss those in my next entry.

]]>
https://blogs.perficient.com/2018/07/10/data-lake-aggregator-critical-role-catalog/feed/ 0 228336
Data Lake Participants – Roles and Responsibilities https://blogs.perficient.com/2018/06/21/data-lake-participants-roles-responsibilities/ https://blogs.perficient.com/2018/06/21/data-lake-participants-roles-responsibilities/#respond Thu, 21 Jun 2018 19:02:50 +0000 https://blogs.perficient.com/?p=225824

As you may recall, in my last blog I introduced the analogy of the Aggregator to describe utilizing a Data Lake as a Consolidator of information, and I mentioned the three key roles in this model: the Supplier, the Aggregator and the Consumer.

In this post I will provide a little more detail on the responsibilities possessed by each of these roles that, when carried out diligently, provide an effective environment for obtaining significant value from the Lake.

For this model to work effectively – there are a few key points to keep in mind at all times:

  • The Supplier has no direct knowledge of the Consumer’s needs or how they want the items presented – that is the role of the Aggregator
  • The Consumer is unaware of the Supplier, only knows what is available by interacting with the Aggregator
  • The Aggregator is driven by an understanding of the Consumer, both in knowing what they need (or may need in the future), as well as how they need to see or access it, therefore, it is the Aggregator that decides how to present items to the Consumer

Keeping these underlying principles in mind, the following set of responsibilities can be defined for each role (note that the embedded examples are for a Healthcare Insurance Provider):

Supplier

  • Provides a full description of what is being delivered to the Data Lake
    • A Conceptual and Logical Model of the information in the “language” of the standard catalog that has been adopted by the Data Lake as representative of the enterprise’s business information – independent of any physical implementation
    • A set of any rules that have been placed upon the information (e.g. this source system only allows one Address per Person)
    • The set of “calculations” being provided, along with a formula of how that calculation is made – using the concepts as defined in the enterprise catalog (e.g. a count of Group Members is the sum of all Plan Members, both the Group Member, i.e. the Subscriber, as well as all the Plan Members identified on each Contract held by the Subscriber)
    • The set of “views” that are represented in the supplied information and the criteria used to generate the content of the view (e.g. all contracts of subscribers that are age 65 or over and are male)
  • Provides a full description of how the information is being delivered to the Data Lake
    • The form (extract file, acquisition service, direct connection “pipe”, etc.)
    • The detailed format within the form that maps back to the “what” documentation presented above

Note that no transformation requirements are provided because, as a supplier, it is not its responsibility

Consumer

  • Provides a full description of what is being requested of the Data Lake
  • A Conceptual and Logical Model of the information in the “language” of the standard catalog that has been adopted by the Data Lake as representative of the enterprise’s business information – independent of any physical implementation
  • A set of any rules that are followed by the target, so the delivered information needs to abide accordingly (e.g. this target system only allows one Benefit Package per Division)
  • The set of “calculations” needed by the target, along with a formula of how that calculation is made – using the concepts as defined in the enterprise catalog (e.g. a count of Group Members is the sum of all Plan Members, both the Group Member, i.e. the Subscriber, as well as all the Plan Members identified on each Contract held by the Subscriber)
  • The set of “views” that are needed to be provided in the supplied information and the criteria that defines the view content (e.g. all contracts for an HMO product where the subscriber is female and resident in the state of Arkansas)
  • Provides a full description of how the information is desired from the Data Lake (this is highly negotiable as the Data Lake may offer alternative delivery mechanisms or may reject the Consumer’s request)
  • The form (extract file, acquisition service, direct connection “pipe”, etc.)
  • The detailed format within the form that maps back to the “what” documentation presented above
  • If transformations needed from what the Data Lake has agreed to make available, a description of the transformation desired

Note that in this model, even other “consolidators” (such as a Data Warehouse or Operational Data Store) are also Consumers, therefore have the same responsibility

Aggregator

  • Ensures there are suppliers with the items the consumers’ need
  • Taking delivery from a supplier, in whatever format that takes, and presenting these items to the consumer
  • Provide the common vocabulary (catalog) of the information currently or “aspirationally” resident in the Data Lake (this may expand as Suppliers come on board with new concepts or Consumers make requests for new concepts)
    • A Conceptual and Logical Model
    • A set of any rules that have been placed upon the information
    • The set of “calculations” available
    • The set of “views” available
  • Provides a full description of how the information can be accessed by a Consumer and the physical mapping for where the information may be found
  • Determines the best approach for moving Supplier information to Consumer accessible information (by using its knowledge of the needs of the consumer and how it wishes to serve the consumer)
  • Provides assistance for both Suppliers and Consumers in representing their information utilizing the common vocabulary
  • Provides guidance and assistance to Consumers in actually obtaining the information from the Data Lake

Governs all the information resident in the Data Lake

This last statement is key to the connection to Information Governance. As a matter of fact, all these responsibility descriptions are an aspect of the “decision rights” defined and controlled by a Governance Body.

The implication being that the “keepers” of the Data Lake must establish the Governance of the information housed in the lake – although it is recommended that the IG Program be created organizationally as a separate and distinct entity from the Data Lake solution owner.

You will also notice that a lynchpin between all these roles is a Catalog that is utilized by all parties in their communications with the other roles. The creation and maintenance of this catalog is the responsibility of the IG Program – and I will talk more about this artifact, and its importance, in my next post.

]]>
https://blogs.perficient.com/2018/06/21/data-lake-participants-roles-responsibilities/feed/ 0 225824
Data Lake Consolidation – the Aggregator Analogy https://blogs.perficient.com/2018/06/05/data-lake-consolidation-aggregator-analogy/ https://blogs.perficient.com/2018/06/05/data-lake-consolidation-aggregator-analogy/#respond Tue, 05 Jun 2018 18:44:37 +0000 https://blogs.perficient.com/?p=225817

In my last blog, I introduced the concept of the Data Lake as a Consolidator and the critical success factor of applying robust Information Governance to this environment. In this post, I want to introduce an analogy to help visualize this environment and the parties involved.

So, a Data Lake as Consolidator. What does that really mean? Well, for me it means obtaining information from multiple sources and making it available to multiple targets – with a key differentiator of ensuring the targets do not need to know which source provided what information.

In other words, de-coupling sources from targets so that the focus is on the actual information is a key characteristic of a powerful, and useful, Data Lake.

This de-coupling provides a level of flexibility in that the addition, removal – and even the alteration of the access mechanism – of an involved system becomes much simpler and more efficient because you need focus only upon a single system, and not worry about how that system may, or may not, interact with others.

Stated another way, the Data Lake Consolidator can be described using the following purpose and value statements:

Purpose of the Data Lake:

The focus of the Data Lake is to provide a singular and common mechanism for the sharing of information across a wide variety of systems and solutions

Rationale/Value:

The benefit of the Data Lake is in the de-coupling of systems and removing point-to-point integration solutions to improve efficiencies and lower maintenance costs, while allowing both the removal and introduction of solutions without impacting any other solution or incurring the cost of integrating or de-integrating solutions

I like to use the analogy of an Aggregator – in that, the central repository (the Data Lake) pulls information from a variety of sources (suppliers), aggregates it (separates, combines, consolidates, repackages – or just leaves it as is) and presents this source independent view of the information to the targets (consumers). The following picture provides a diagrammatic representation of this analogy:

This real-world concept is applied in our day-to-day living all the time – and is the underlying model to all retail interactions. But, as indicated, the “warehouse” model is probably closest to the Data Lake concept because it also provides the “direct” access to the products of the supplier “as-delivered” (just sitting on a pallet) – which is one of the options in a Data Lake.

For the right consumer, sometimes it just makes sense to provide direct access, offering that option in concert with the “re-packaged” versions.

This model relies upon a couple key concepts: one being the reference to a “common vocabulary”, which I’ll discuss in a later post, and the other the roles of Supplier, Aggregator and Consumer.

It is critical to well-define and articulate these roles and their responsibilities so that all parties are “on the same page” as far as knowing how they play a part, and equally important, where the lines of demarcation lie between these roles. I will delve a little more deeply into these roles and responsibilities in my next post.

]]>
https://blogs.perficient.com/2018/06/05/data-lake-consolidation-aggregator-analogy/feed/ 0 225817
Data Lakes and the Information Governance Critical Success Factor https://blogs.perficient.com/2018/05/17/data-lakes-information-governance-critical-success-factor/ https://blogs.perficient.com/2018/05/17/data-lakes-information-governance-critical-success-factor/#respond Thu, 17 May 2018 18:39:38 +0000 https://blogs.perficient.com/?p=225814

Since my last post I’ve been working for a client that is actively engaged in establishing a Data Lake for the purpose of supporting their analytics efforts, but also looking to “re-architect” the way their systems collaborate by using this Data Lake environment to control and consolidate all information-sharing interactions within their environment.

I was most interested in whether and how Information Governance practices were being defined and applied to this new “centralized” view of information sharing. This will be the focus of my next few blog entries.

I’m sure by now most people are familiar with the Data Lake concept, wherein the idea is that all data entering the enterprise – regardless of content, format or source – is placed, or landed, into the “lake” for others to access. However, to access this “raw” data efficiently and effectively requires some level of transformation, consolidation and standardization so that there is a “common” view of the information in order to serve multiple targets without each of them having to devise their own custom mechanism for obtaining what they need from the lake.

It is this common view that requires Information Governance. By putting in place an appropriate set of decision rights, controls (policies, rules, guidelines, etc.) and processes, there is a much better chance that the Lake will not become polluted, AND, the actual content of the lake remains not only useful, but accessible – irrespective of the addition and subtraction of both sources and targets.

Over the next few months I will present my thoughts on how best to go about this. First I’ll describe the “architecture” and concept of utilizing a Data Lake for the above-mentioned purposes – using an analogy of an Aggregator (not unlike the warehouse store model that presents its offerings sometimes just as received and other times “repackaged” based upon consumer demand) – and from there I will dive into the roles and responsibilities of the players involved, the critical role of a “catalog” for managing the lakes content, the equally critical role of standards and templates, the absolute essential requirement of a robust Information Governance Program, and finally, a summary with some of the key takeaways.

Note that this is NOT a technical discussion – so will not be talking about Hadoop, NoSQL, RDBMS or any of the other myriad associated technology – but will focus upon the concepts and usage of a Governed Data Lake for ensuring business value is truly obtained from this environment.

I hope you will join me in this journey and that you find this both informative and useful.

]]>
https://blogs.perficient.com/2018/05/17/data-lakes-information-governance-critical-success-factor/feed/ 0 225814
AHIMA’s Assessment Tool Valuable for Healthcare IG https://blogs.perficient.com/2018/01/06/ahimas-assessment-tool-valuable-for-healthcare-information-governance/ https://blogs.perficient.com/2018/01/06/ahimas-assessment-tool-valuable-for-healthcare-information-governance/#respond Sat, 06 Jan 2018 13:35:22 +0000 https://blogs.perficient.com/healthcare/?p=11455

As I have been continuing to work in the information governance area as it relates to healthcare, I recently came across an interesting development.

Some of my previous blog posts have covered the difference between Information Governance and Data Governance and some of the players in the field, including the American Health Information Management Association (AHIMA) – specifically in the healthcare space and their efforts in the information governance arena.

Since those posts, I’ve recently had a few conversations with the IG Advisors arm of the association and learned they have introduced a new tool for measuring an organization’s maturity with regard to their Information Governance (IG) Program. This tool is called IGHealthRate(TM), and it’s a fairly robust tool for determining not only the current maturity level of an organization but also providing some insights on steps the organization could take to progress forward on the maturity curve.

I’ve always believed that before any change can occur one should clearly define a Vision of where they would like to be, regardless of where they may actually be currently. An assessment tool like IGHealthRate(TM) is a great mechanism for understanding where you are and for forming a solid picture of where you would like to be.

AHIMA’s assessment tool is reflective of most maturity models in that it uses five levels of maturity they have identified as At Risk, Aware, Aspirational, Aligned and Actualized. It then uses its own framework as described in their IG Toolkit to evaluate an organization across all the “pieces,” or what they call Competencies, that a fully robust IG Program possesses and, through surveys and interviews, “scores” the organization’s maturity against each of these dimensions. They call this the Information Governance Adoption Model (IGAM)(TM) and the Competencies they identify are: IG Structure, Strategic Alignment, Enterprise Info Mgmt, Data Governance, IT Governance, Analytics, Privacy and Security, Legal and Regulatory, Awareness and Adherence, and IG Performance. From there, a roadmap can be defined by the organization for how best to evolve within each of these dimensions to move closer to its Vision State.

If you are interested in establishing (or improving) an IG Program, AHIMA’s IGHealthRate(TM) is a good first step to consider. It requires a minimal investment and its results can help build a business case for pursuing and maturing the IG Program.

]]>
https://blogs.perficient.com/2018/01/06/ahimas-assessment-tool-valuable-for-healthcare-information-governance/feed/ 0 181886
Addressing the Information Challenge: 7 Ways Governance Can Help https://blogs.perficient.com/2017/04/26/addressing-the-information-challenge-7-ways-governance-can-help/ https://blogs.perficient.com/2017/04/26/addressing-the-information-challenge-7-ways-governance-can-help/#respond Wed, 26 Apr 2017 19:58:59 +0000 https://blogs.perficient.com/healthcare/?p=10752

The explosion of data is something that executives across industry are trying to wrap their heads around. Healthcare is no different. In fact, healthcare data is expected to grow 99% – patient data, wearables, medical literature, scientific articles, etc. are adding to the explosion of healthcare information. This data deluge is a big challenge for healthcare organizations because they are unable to leverage information to make timely and profitable business decisions.

To solve the data challenge many organizations try:

  • Implementing Master Data Management or some other data management initiative
  • Acquiring quality tools or other technology
  • Putting people “in charge” whether through committees or assignments to “manage the information”

Unfortunately, these approaches are not very effective. In order to tackle data challenges healthcare organizations must turn to governance. Governance helps address the information challenges by:

  • Ensuring information is fresh, available and accessible
  • Articulating who can make changes and when and enforces these decision rights to prevent rogue changes
  • Identifying all repositories, their purpose and their content based upon an enterprise-wide common vocabulary
  • Defining, maintaining and publishing a common vocabulary specific to the enterprise’s needs and language
  • Supplying an enterprise-wide description of each areas information use and the mechanisms to ensure cross-functional alignment and management support
  • Providing clearly defined rules for quality, integrity, representation, etc. of the information and clear processes and responsibilities for stewarding the information for adherence to these rules
  • Assigning, communicating and enforcing decision rights across the enterprise, as well as ensures actions taken and decisions made are broadly communicated

Information and data governance are quickly becoming imperative for a healthcare industry that is both seeking to capitalize on the value of its information assets and that is committed to ensuring the reliability and integrity of information and data used to improve care quality, operations, and financial performance. After all, trust in health information and high-quality patient care depend on it.

To learn more about trends impacting healthcare governance, download our recent guide, Healthcare Governance, Trends to Watch.

]]>
https://blogs.perficient.com/2017/04/26/addressing-the-information-challenge-7-ways-governance-can-help/feed/ 0 181858
Trends in Governance: Enterprise Modeling is Essential https://blogs.perficient.com/2017/04/23/trends-in-governance-enterprise-modeling-is-essential/ https://blogs.perficient.com/2017/04/23/trends-in-governance-enterprise-modeling-is-essential/#respond Sun, 23 Apr 2017 22:36:37 +0000 https://blogs.perficient.com/healthcare/?p=10746

Of all the governance trends, none is more foundational and critical to the success of the governance program – indeed the organization itself – than the need for accurate, consistent, and relevant models that communicate the meaning, use, and residency of the information assets of the enterprise.

Modeling not only addresses the integration and ingestion of data across and between information systems, but also aids in communication both within a healthcare organization as well as in the organization’s interactions with patients, partners, vendors and consumers. The models provide a consistent basis for understanding and minimize miscommunications, thereby increasing organizational efficiencies.

Governance programs are adopting not just the classic business glossary, but information reference models that provide the necessary context for the information across business units, technologies, applications, and personnel changes. Governance is becoming the keeper of this common language in order to ensure the associated rules, policies, controls, decision rights, and processes defined to govern the information are both understandable and enforceable regardless of the area of the organization impacted.

To learn more about this trend and the other trends impacting healthcare governance, download our recent guide, Healthcare Governance, Trends to Watch.

]]>
https://blogs.perficient.com/2017/04/23/trends-in-governance-enterprise-modeling-is-essential/feed/ 0 181857
Trends in Governance: Cross-Enterprise Semantics & Metadata https://blogs.perficient.com/2017/04/20/trends-in-governance-cross-enterprise-semantics-metadata/ https://blogs.perficient.com/2017/04/20/trends-in-governance-cross-enterprise-semantics-metadata/#respond Thu, 20 Apr 2017 18:28:00 +0000 https://blogs.perficient.com/healthcare/?p=10743

Addressing the ongoing explosion of data sources and storage options requires establishing consistent metadata attributes and semantic models across the enterprise to effectively govern information as an enterprise asset.

The primary objective of any enterprise governance program is to ensure consistent and timely data, so reaching consensus and agreement on a common understanding of concepts and metadata attributes must be addressed and enforced by the program. Cloud applications, for example, continue to be adopted and most have their own semantic and metadata models. Integration of these respective views is foundational to governance because without it meaning and reusability of the information suffers.

A recognition is forming that as information becomes a true enterprise asset, that the need to cross silos and reach consensus on a consistent meaning of the semantics and metadata used to describe the business domains and the information itself is becoming critical. This common understanding is best facilitated and controlled through robust governance that is enforced company-wide.

To learn more about this trend and the other trends impacting healthcare governance, download our recent guide, Healthcare Governance, Trends to Watch.

]]>
https://blogs.perficient.com/2017/04/20/trends-in-governance-cross-enterprise-semantics-metadata/feed/ 0 181856
Trends in Governance: Democratization of Ownership https://blogs.perficient.com/2017/04/18/trends-in-governance-democratization-of-ownership/ https://blogs.perficient.com/2017/04/18/trends-in-governance-democratization-of-ownership/#respond Tue, 18 Apr 2017 20:43:26 +0000 https://blogs.perficient.com/healthcare/?p=10741

The rise of Big Data, self-service, and more powerful and flexible end-user information visualization and preparation tools is impacting governance in a significant manner with regard to structure, decision rights, and accountabilities. End-users are gaining more control of data, including the ability to integrate and manipulate data for their own purposes, and being able to select data based on a relevance criteria not necessarily codified in classic metadata or semantic models.

What this means is that the responsibility of governance, such as adhering to access policies, is becoming the responsibility of practically any individual that needs or uses the data. Stewardship, therefore, is becoming democratized across the user community, directly impacting the centralized model where clear stewards and owners are typically named along domain boundaries. This paradigm shift means that anyone who uses the data has a say in how it is governed, but also the responsibility to behave accordingly.

As self-service as one of the key drivers, the need for broadening the responsibility for the stewarding of the information to a larger community of interested parties is becoming more common. This is consistent with the move towards a business-centric approach as it is the business users who have the need and are taking on this responsibility for the information critical to them. Better data preparation tools and governance stewardship applications are also contributing to and supporting this trend.

To learn more about this trend and the other trends impacting healthcare governance, download our recent guide, Healthcare Governance, Trends to Watch.

]]>
https://blogs.perficient.com/2017/04/18/trends-in-governance-democratization-of-ownership/feed/ 0 181855