Perficient Healtchare Solutions Blog


Michael Anderson

Michael has more than twenty years of Data Warehousing experience on multiple hardware platforms for various industries. Enterprise Architect for Data Warehouses and Marts at Fortune 500 Companies. He has extensive experience with ETL, Business Intelligence\DSS, SOA, and Data Quality related issues.

Posts by this author: RSS

Creating the Canonical Modeling Environment – Part 2

Now that we’ve established what a canonical data model is, let’s talk about our objectives for what we want to achieve with our Canonical Data Model and what toolsets can be applied.  In my Canonical Data Modeling environment, I want to store and manage my entities and their relationships.  I want support for the Conceptual, Logical, and Physical Data Models.  I want support for normalization and/or dimensional modeling. I want to store and manage my business rules.  I want to be able to generate XML and WSDLs for my SOA developers.  I want to perform these acts in an Enterprise Model so I can be model driven.  And I want to be able to modify my models and generate my supporting SOA artifacts as quickly as the business changes.

A few additional objectives for any Canonical Data Model environment should:

  • Provide an enterprise-oriented dictionary of reusable common objects and definitions at the enterprise or business domain level to enhance interoperability.  We want to define a common business language and encourage its adoption by the IT community.
  • Generate canonical schemas based on the enterprise semantic model to help an organization avoid having Services that might use incompatible schemas in its data representations, thus hindering service interaction and composition.
  • Establish an Enterprise Data Model to store in a central location my entities, relationships, and business rules that manner to the Business
  • Deploy the Enterprise Canonical Model to help us avoid services where disparate models for similar data may impose transformation requirements that increase development effort, design complexity, and runtime performance overhead.
  • Provide support for various model diagrams (Use Case Diagram, Class Diagrams, Activity Diagrams, Sequence Diagrams, etc.) in helping the Architect build this integrated world.
  • Allow the ability to store and manage the entities and relationships required by the Business.
  • Facilitate the ability to the business rules (functions \ web service) required by the Business.
  • Provide the ability to associate the business rules with the entities

Finding a toolset for this effort is not easy.  IBM’s Data Architect, Computer Associate’s Erwin, and Embarcadero’s ER Studio are good toolsets for modeling relational objects (rdb).  As far as I know, none of them will manage the business rules, and generate the XML & WSDLs to support a SOA environment.  However, these vendors are advancing their toolset towards this mean, and I think you’ll see progress in that regard in the next few years. (more…)

Canonical Data Modeling–Marriage of SOA & Enterprise Data Model

A Canonical Model is the marriage of your data’s business semantics and the related business rules governing your enterprise asset.  Your data assets can be represented by structure (Relational Data) or non- structure (Big Data) in multiple ontological frameworks.  Your business semantic is composed of the natural business language used for conducting its affairs stored in the model.  It should reflect the terms of how the Business conducts its affairs in the course of events.  And the set of rules, canon law, come in the form of explicit policies governing the conduct of the organization (SOA).

So do we need canonical data modeling?  I think so.  I’ve been a canonical practitioner for a couple of years where an organization chooses to integrate its SOA, Relational, and Big Data environments into a corporate enterprise data model.  This environment allows your organization to better leverage its web services, aligns those services to the corporate data model, and enforces a common language dictated by the owners (the Business).

Is the canonical data model of interest to you?  If so, please give me a thumbs up.  In my next blog entry, I plan to show readers how to set up a Canonical Modeling environment (Part 2) and execute its implementation in a working IT environment (Part 3).   The emphasis of implementation will apply a few simple best practices:

  • Managing the Canonical model in a tool agnostic setting (Part 2)
  • Enterprise Model driven development and data governance (Part 3)
  • Integration of your business rules using common ETL and SOA practices (Part 3)

We will summarize the discussion (Part 4) on why a Canonical Modeling environment provides greater integration between applications and how it better aligns the Business Strategy to the IT strategy.  And we’ll provide some real world examples where I have successfully implemented a Canonical Modeling effort.

Thank you.

Is Your IT BI Strategy Aligned to the Business Strategy?

In my February blog, I wrote about basic BI categories so we could assess our BI environment.  If you recall our BI categories were as follows:

  • Type I – Reporting and Query tool sets
  • Type II – Analytic tool sets
  • Type III – Predictive Modeling tool sets

Now, we can gather some facts to measure our current BI state to determine who we are as BI consumers.  And it gives us a great opportunity to determine if our IT Strategy matches our Business Strategy.  What we want to know is, “How can we improve our IT Services for the Business we serve?”

As an Enterprise Architect, I look at the following areas:

  • What are my BI spends?  I like looking at all accounts payable for the previous year for all 3rd party software vendors and BI vendors.  In most cases, you’ll know which BI license component each spend represents (BI base engine, OLAP component, data mining, etc).  If in doubt, you can easily contact the vendor.
  • What are my BI reports?  I examine all of my metadata where I can take an inventory of the BI reports, recognize the source databases being queried for each report (and what Web Services are being consumed), and determine how often each (canned) report is run?  For ad hoc reporting, most BI toolsets will provide metrics for measuring ad hoc efforts.
  • Take an inventory of all BI Servers (DEV, TEST and PROD).  Create a simple topology diagram showing server, CPU, RAM and storage.
  • Finally, I want to examine the metrics for my BI user community. Do our SLA metrics measure up to expectations?

You should be able to gather the BI spends in less than a day or two of effort.  Following the money trail, you may find rogue BI efforts where a department has purchased a server and BI toolset.  What’s the big deal?  If your rogue group is publishing values from unaudited data sources, it creates miscommunication.  It’s like having a duplicate set of books.  Also, they’re spending operational dollars to maintain their private BI infrastructure.  BI vendors love to go around enterprise architect to sell licenses to increase their profits at your expense.

In a past life, I was able to stack appls on fewer, beefier servers and eliminate dozens of small servers to provide a small BI spend while improving BI Service.  You can expect a 30-50% savings just on BI licenses, servers and maintenance.  And we created Centers of Excellence, where BI specialists could assist departments in converting to your BI toolset standards and leverage this Corporate Core Competency across the enterprise landscape.

Next, I like to see what databases (and tables) and what OLAP Cubes are being utilized.  Am I applying my human capital to the right places?  Bug fixes and enhancement requests will help us measure where the fires are taking place.  Also, do I have reporting data sources that are not being used?  If so, why are we spending business capital on obsolete resources?  And I’m sure we’d all agree we have canned reports that we run, but no one reads the reports.  These are great candidates to rename, archive, and remove them after 90 days.  You may see response times improve after removing this excess baggage.

Now, I can inform management that of my Type I, II, and III spends.  I can state how many OLAP servers, data marts, etc. that I deploy.  And now, I can reflect on my BI type percentages of total spend.  If I find my Type I spend is 90% then I know I’m not deploying ad hoc reporting and my user community is less engaged in exploring their data.  Is this good?  It depends on my business strategy.  If I have a dynamic market place, I want to empower my users so many I need to rebalance my BI infrastructure.  I like creating “AS IS” and “TO BE” diagrams to show how we can re-allocate valuable resources.

Finally, I can also measure BI satisfaction from my BI metadata.  CMMI and Six Sigma call for SLA where the business community can set metrics to hold IT accountable.  The right metrics help all of us to become more competitive in the global marketplace.  Take ownership, and lets become good stewards of our BI landscape.

Intelligent Taxonomies

In my past, I’ve had to perform Data Warehouse (DW) Assessments where we need to assess an organization’s DW maturity.  In doing so, it’s important to categorize the things that you’re measuring to give it more context.  Here is a two part blog where I provide a simple definition of  BI and what categories are important to the organization.  Given you agree on my categories, I will discuss how we might assess our IT strategy to determine how we align\mis-align to the objectives of business.  Remember, we want our assessment to be objective, responsive to the business (quick turnaround) and accurate regarding the facts.  So, let’s start by defining the terms of what we want to measure.

What is Business Intelligence?

Business Intelligence (BI) is a user-centric effort to provide access, exploration and the means to analyze raw data to improve a user’s insight and to develop a better understanding of the business.  BI enables the organization to harvest the information from its legacy systems, to integrate data across the enterprise and to empower business users in becoming information self-sufficient.

I categorize BI into 3 distinct types.

Type I – Reporting and Query tool sets.   Type I is generally characterized by predefined or ad hoc queries.  Technologists (Developers or Power Users) will set up reports based on business requirements or management’s communication on what needs to be explored based on time or some other criteria.  The reports can made up of a set of charts, graphs, formatted layout and so forth.  The data sources can be any Database, Multiple Disparate Sources (Oracle and MS SQL Server), or Multiple Homogenous Databases (2 or more Oracle RDB), a Web Service and any combination as necessary.  It makes no difference as long as the users have the proper authority for access.  These canned reports represent a moment in time.   The underlying queries are known, and the data is mostly summarized and presented quickly.   Type I reporting is something all organization will do in its initial DSS stages.  Type I will answer strategic questions such as “What was my revenue by customer ABC for last quarter?” or “Who are my current Customers today?”  Type I BI allows the user to get to know their data.  Sometimes you’ll hear Type I referred to as ‘hindsight reporting.’

Type II – Analytic tool sets.  Type II is generally where we take the raw data and begin treating it as a Corporate Asset.  At this stage, we may want to transform the information in a Data Warehouse or a Data Mart into an OLAP cube.  Today, some of the BI Visualization tools will dynamically create your visual BI maps based on the available Data Sources.  By the way, OLAP is short for Online Analytical Processing, which is a set of software tools that provides the means for analysis of data (stored or otherwise).  OLAP will segment data and analyze information by 1 or more dimensions (subject area of interest) such as by customer against a metric like revenue.  OLAP tools enable users to analyze large volumes of data to gain better insight into your business.  At a large, established manufacturer we built an OLAP application for the Business Community where we went from a 7 point market share and drove it to 14 within a 2 year time period.  The stock price quadruple and the CEO attributed the significant gains by putting Type II toolsets into the hands of his front line warriors.

An OLAP cube has 2 primary components: dimensions and measures.  A dimension is a subject area of interest to the business.  Typical dimensions include TIME, PRODUCT, GEOG, and possible RETAIL (Trade Channels and Distributors).  Also, the business will want facts in supporting of the dimensions.  Facts, also known as measures\metrics will include counts and amounts such as revenue, volumes, physical inventory of product counts and so forth.  Think of it as anything our customer wants to measure.  This information is stored in an OLAP cube.  OLAP cubes have various characteristics.  MOLAP is built for fast access, least CPU intensive, and where the information needs to be updated on a periodic basis.  ROLAP is built for access to ever changing data and its very CPU intensive.  It will generally be pointed to a Data Mart (RDB).  HOLAP allows for a combination of MOLAP and ROLAP.  HOLAP cubes require more time and thought in their creation and maintenance.  And some vendors will not offer an actual OLAP engine where the cube is considered a Virtual OLAP Cube (VOLAP), aka in-memory analytics.  VOLAP is good for light OLAP needs.  The difference between OLTP systems and OLAP is that OLTP systems help users capture the transaction information necessary to run their business operations, and OLAP systems analyze transaction information at an aggregate level to improve the decision-making process.

Type III – Predictive Modeling tool sets.  Predictive modeling allows us to forecast where we think our business may be going.  It creates a better means for managing “what if” scenarios so decision makers can identify critical facts and review the results against multiple of economy models.  We want to deploy sophisticated statistical analysis, identify patterns and trends, and before our competition does.  Users are generally more pro-active in this stage.  Data Mining is used to determine where hidden patterns may exist to predict future behavior.

Now that you have a good understanding of BI categories, you might be asking, “How can we use it  to improve our IT Services and garner a bigger year-end bonus for ourselves?”  I like how you think. :)

My next blog will be on the importance of aligning our IT Strategy to our Business Strategy.  And if no IT Strategy exists, how you can map it out like Lewis and Clark where you’re the early explorer mapping the terrain to determine the best route to the Gold Country. Also, I’ll address how to improve your BI Strategy in making it more effective and cost efficient.

Data Profiling: The First Step in Data Quality

When I think of data quality, I think of three primary components: data profiling, data correction, and data monitoring. Data profiling is the act of analyzing your data contents. Data correction is the act of correcting your data content when it falls below your standards. And data monitoring is the ongoing act of establishing data quality standards in a set of metrics meaningful to the business, reviewing the results in a re-occurring fashion and taking corrective action whenever we exceed the acceptable thresholds of quality.

Today, I want to focus on data profiling. Data profiling is the analysis of data content in conjunction with every new and existing application effort. We can profile batch data, near/real time data, structured and non-structured data, or any data asset meaningful to the organization. Data profiling provides organizations the ability to analyze large amounts of data quickly in a systematic and repeatable process. Data profiling will provide your organization with a methodical, repeatable, consistent, and metrics-based means to evaluate your data. You should constantly evaluate your data given its dynamic nature. (more…)