Perficient Enterprise Information Solutions Blog

Blog Categories

Subscribe via Email

Subscribe to RSS feed

Archives

Follow Enterprise Information Technology on Pinterest

Data Quality – Don’t Fix It If It Ain’t Broke

 

What is broke?   If I drive a pick-up truck around that has a small, unobtrusive crack in the windshield and a few dings in the paint, it wpickup truck haulingill still pull a boat and haul a bunch of lumber from Home Depot.  Is the pick-up broke if it still meets my needs?

So, when is data broke?   In our legacy data integration practices, we would profile data and identify all that is wrong with the data.  Orphan keys, in-appropriate values, and incomplete data (to name a few) would be identified before data was moved.   In the more stringent organizations data would need to near perfect for it to be used in a data warehouse.    This ideal world or perfect data was strived after, but rarely obtained.  It was too expensive, required too much business buy in, and lengthen BI and DW projects.   Read the rest of this post »

Think Better Business Intelligence

Think First by jDevaun.Photography, on FlickrCreative Commons Creative Commons Attribution-No Derivative Works 2.0 Generic License by jDevaun.Photography

Everyone is guilty of falling into a rut and building reports the same way over and over again. This year, don’t just churn out the same old reports, resolve to deliver better business intelligence. Think about what business intelligence means. Resolve, at least in your world, to make business intelligence about helping organizations improve business outcomes by making informed decisions. When the next report requests land on your desk leave the tool of choice alone, Cognos in my case, and think for a while. This even applies to those of you building your own reports in a self-service BI world.

Think about the business value. How will the user make better business decisions? Is the user trying to understand how to allocate capital? Is the user trying to improve patient care? Is the user trying to stem the loss of customers to a competitor? Is the user trying find the right price point for their product? No matter what the ultimate object, this gets you thinking like the business person and makes you realize the goal is not a report.

Think about the obstacles to getting the information. Is the existing report or system to slow? Is the data dirty or incorrect? Is the data to slow to arrive or to old to use? Is the existing system to arcane to use? You know the type – when the moon is full, stand on your left leg, squint, hit O-H-Ctrl-R-Alt-P then the report comes out perfectly – if it doesn’t time out. Think about it, if there were no obstacles there would be no report request in your hands

Think about the usage. Who is going to use the analysis? Where will they be using it? How will they get access to the reports? Can everyone see all the data or is some of it restricted? Are users allowed to share the data with others? How will the users interact with the data and information? When do the users need the information in their hands? How current does the data need to be? How often does the data need to be refreshed? How does the data have to interact with other systems? Thinking through the usage gives you a perspective beyond the parochial limits of your BI tool.

Think like Edward Tufte. What should the structure of the report look like? How would it look in black and white? What form should the presentation take? How should the objects be laid out? What visualizations should be used? And, those are never pie-charts. What components can be taken away without reducing the amount of information presented? What components can be added, in the same real-estate, without littering, to improve the information provided? How can you minimize the clutter and maximize the information. Think about the flaws of write once and deliver anywhere, and the garish palates many BI tools provide.

Think about performance. Is the user thinking instantaneous response? Is the user thinking get a cup of tea and come back response time? Is the user okay kicking off a job and getting the results the next morning? If you find one of these, cherish them! They are hard to find these days. Will the user immediately select the next action or do they require some think time. Is the data set a couple of structured transactional records or is the data set a chunk of a big-data lake? Does the data set live in one homogenous source or across many heterogeneous sources? Thinking about performance early means you won’t fall into a trap of missed expectations or an impossible implementation.

Think about data quality. It is a fact of life. How do you deal with and present missing data? How do you deal with incorrect values? How do you deal with out of bounds data? What is the cost of a decision made on bad data? What are the consequences of a decision made on incorrect data? What is the cost of perfect data? What is the value of better data. Thinking about quality before you start coding lets you find a balance between cost and value.

Think about maintenance. Who is going to be responsible for modifications and changes? You know they are going to be needed. As good as you are, you won’t get everything right. Is better to quickly replicate a report multiple times and change the filters, or is it better to spend some extra time and use parameters and conditional code to have a single report server many purposes? Is it better to use platform specific outputs or is it better to use a “hybrid” solution and support every output format from a single build? Are the reports expected to be viable in 10-years or will they be redone in 10-weeks? Thinking through the maintenance needs will let you invest your time in the right areas

Think you are ready to build? Think again. Think through your tool sets capabilities and match them to you needs. Think through your users skills and match them to the tools. Think about your support team and let them know what you need. Think through your design and make sure it is viable.

Here’s to thinking better Business Intelligence throughout the year.

 

Big Data, Big Confusion

 

bd_confusion1

Everyone wants a piece of Big Data action whether you are part of Product Company, Solution provider, IT, or Business user. Like every new technology, Big Data is confusing, complex and intimidating. Though the idea is intriguing, the confusion begins when the techies start taking sides and tout the underlying tools rather than solution. But the fact is picking the right architecture (tools, platforms) does matter. It involves consideration of several aspects starting from understanding the technologies appropriate for the organization to understanding the total cost of ownership.

When you look at the organizations embarking on Big Data initiative, most organizations fall into the following 3 types.

De-centralized

Have experimented with several tools, multiple deployments done in multiple platforms by multiple business units/subsidiaries. Own several tool licenses, built several Data applications or experimenting currently. Many data management applications in production.

Loosely Centralized /Mostly De-centralized

Has Enterprise focus but BU’s and departmental Data applications are in use. Also several tools purchased over the years across various BU’s and departments. Many data management applications in production.

No major Data Applications

Yet to invest in major data applications. Mostly rely on reports and spreadsheets.

In all of the above scenarios, IT leaders can make a big difference in shaping the vision for embarking on a Big Data journey. Mostly Big Data projects have been experimental for many and the pressure to deliver tangible results is very high. Typically optimal tools strategy and standards takes a back seat. However at some point it becomes a priority.  The opportunity to focus on the vision and strategy is easier to sell when leadership change occurs within the organization. If you are the new manager to tackle Big Data, it is your chance to use your first 90 days to formulate the strategy than get sucked into business as usual. Utilizing these moments to formulate a strategy for platform / tools standardization is not only prudent but also presents greater opportunity for approval.  These strategic focus is critical for continued success and to avoid investments with low returns.

The options within Big Data is vast. Vendors with legacy products to startup companies offer several solutions. Traversing the maze of products without the help of right partners can lead to false starts and big project delays.

 

Big Data Changes Everything – Has Your Governance Changed?

A few years ago, Big Data/Hadoop systems were generally a side project for either storing bulk data or for analytics. But now as companies  have pursued a data unification strategy, leveraging the Next Generation Data Architecture, Big Data and Hadoop systems are becoming a strategic necessity in the modern enterprise.

shutterstock_124189609 (1)

Tupungato / Shutterstock.com

Big Data and Hadoop are technologies with so much promise and a very broad and deep value proposition. But why are enterprises struggling to see real-world results from their Big Data investments? Simply put it is governance.   Read the rest of this post »

The New Data Integration Paradigm

Data integration has changed.  The old way of extracting data, moving it to a new server, A little stuffed animal called Hadooptransforming it, and then loading into a new system for reporting and analytics is now looking quite arcane. It’s expensive, time consuming, and does not scale to handle the volumes we are now seeing in the digitally transformed enterprise.

We saw this coming, with push down optimization and the early incarnations of Extract Load and Transform (ELT). Both of these architectural solutions were used to address scalability.

Hadoop has taken this to the next step where the whole basis of Hadoop is to process the data where it is stored.  Actually, this is bigger than Hadoop. The movement to cloud data integration will require the processing to be completed where the data is stored as well.

To understand how a solution may scale in a Hadoop or cloud centric architecture, one will need to understand where processing happens with regards to where the data is stored. To do this, one needs to ask vendors three questions:

  1. When is data moved off of the cluster? – Clearly understand when is data required to be moved off of the cluster. In general, the only time we should be moving data is to move to a downstream operational system to be consumed. Another way to put this, data should not be moved to an ETL or Data Integration server for processing then moved back to the Hadoop cluster.
  2. When is data moved from the data node? – Evaluate which functions require data to be moved off of the data node to a name or resource manager.   Tools that utilize Hive are of particular concern since anything that is pushed to Hive for processing will inherit the limitations of Hive. Earlier versions of Hive required data to be moved through the name node for processing.   Although Hive has made great strides pushing processing to the data nodes, there still are limitations.
  3. On the data node, when is data moved into memory?  — Within a Hadoop data node disk I/O is still a limiting factor.  Technologies that require data to be written to disk after a task is completed can quickly become I/O bound on the data node. Other solutions load data into all data into memory before processing may not scale to higher volumes.

Of course there is much more to be evaluated, however, choosing technologies that keep processing close to the data, instead of moving data to the processing will smooth the transition to the next generation architecture.   Follow Bill on Twitter @bigdata73

Information Governance Trends for 2015

The year 2014 just ended. 2015 is already looking like another year of data intensive initiatives.  Looking at the initiatives and investments happened in 2014, Big Data will continue to be on the top, and so does the cloud.

Enterprise Data investment continues to grow as the laggards in technology are warming up to the Information governance in general. Big Data and cloud brings more compliance and security issues. While the products and offerings evolve, it becomes more important than ever to address the Data Governance.

Some of the biggest trends in information management will be:

trend

  • Adaption of MDM to enhance Consumer experience and context
  • More of single domain strategy than multi-domain MDM including cloud MDM
  • Big Data Governance (security, access and implications) and Master Data for context

Data Quality is the underlying thread in all of the data initiatives if not the prime reason for these initiatives. Approaching information governance like any other project creates silos of information or less trustworthy information. Having the right strategy and approach is the key for a successful implementation and transformation.

Many companies overlook the pockets of investment already made in Information Governance or stuck with limitations of using what they already have even if it is not the right approach. Creating the right vision for future state architecture leveraging the existing investment is possible only if the total cost of ownership is analyzed.  If these three trends dominate, we will see more MDM systems deployed in 2015 than ever before.

Happy New Year!

 

 

Analytics in the Digital Transformation Era

Successful Enterprises compete on many capabilities ranging from product excellence, customer service and marketing to name a few. Increasingly the back office / Information Technology (IT) is becoming a strategic player in the Digital Business Model which supports these key capabilities. In other words back office/IT Capability itself is becoming a differentiator. All of the key strategies like Customer Excellence, Product Excellence, and Market Segmentation depend on the successful Digital Business Model.

Having more data especially noisy data is complex to deal with. New platforms and tools are a must to make it possible to deal with them. Working with internally captured Enterprise data to answer strategic questions like “Should there be a pricing difference of life, annuities, and long-term care?” or setting up the benchmark for “Servicing cost per policy for life, annuities, and long-term care” can only go that much far. Ingesting and integrating the external data including machine data will change the way pricing and segmentation is done today.

In the technology space a wide variety of capabilities in terms tools / platforms, architecture offering Time to Market opportunities to leading edge predictive / prescriptive models to enable Business to operate and execute efficiently. What this all means is that Business has to embrace the Digital transformation happening faster than ever.

Traditional Analytics

two_approach

Key strategies from IT should include two kinds of applications / platforms for dealing with new analytical and old analytical methods. The first kind is slow-moving or traditional Enterprise data which ends up in the warehouse and made available for ‘What happened’ questions, traditional reporting, business intelligence / Analytics etc.

Fast Analytics

The second kind is the real-time analytical response to the interactive customer, keeping in constant touch through multiple channels while providing seamless interaction and user experience. Technologies, platforms, architecture and applications are different for these two types of processing.

In the new world of Information management, traditional Enterprise applications and Data Warehouse becomes another source rather than the complete source of Data. Even absence of data is relevant information if the context is captured. Analytics is becoming more real-time with adaptive algorithms influencing different outcome based on the contextual data. Building the modern Information platforms to address these two different needs of the enterprise is becoming the new standard.

Reference Architecture – Cloud MDM for Salesforce

In my earlier post Why MDM should be part of CRM Strategy I discussed the importance of having a MDM strategy along with the CRM initiative. Proliferation of cloud CRM solutions like Salesforce is a blessing and it can also be an IT nightmare, if it is not managed properly.  In many companies Salesforce implementation is left to the Business discretion with minimal IT interaction.

In one of our client interactions, the IT leader was in fact trying his best not to address the Master Data issue and just implement what the business was asking without addressing the Master Data implications. CRM implementation or enhancement is an opportunity for IT to get engaged and put their Master Data strategy in place. See my post on Why Master Data is different from CRM?

Though Master Data has been a familiar term many companies are far from implementing it. Mostly lack of ownership and cost justification barriers hold up the progress. In this blog, I want to show the reference architecture for Cloud MDM solution from Informatica which can be an alternative to the traditional MDM on the cloud (see – my earlier post). Total cost of ownership will be much less than the traditional MDM.cloud_mdm_ref_arch

Typically the customer records are created in Salesforce and some key application systems like ERP or on-line store (eCommerce) sites. Cloud MDM which is typically installed in one of the Salesforce instances (Master) can be integrated with subscribing and Authoring applications synchronously or asynchronously depending on the timing requirements. All the de-dup and stewardship activities are performed using the Cloud MDM tool. Informatica Cloud MDM is a native Salesforce application which is deployed in the client’s Salesforce instance. This Architecture shows the expanded Cloud MDM solution integrating to Enterprise applications. This architecture supports only Customer Master.

Using this Architecture most of the Customer Master Data can be managed. This can also be a stepping stone for a full-blown future multi-domain MDM initiative.

See also our recent webinar on Cloud MDM specifically addressing salesforce based Customer Master Data management and the following related blog posts Cloud MDM – Are we getting closer? , Reference Architecture for Public Cloud Platform, Customer Analytics and Master Data, Master Data – Why is it different from CRM?, and Why MDM should be part of CRM Strategy?

The Industrialization of Advanced Analytics

Gartner recently released its predictions on this topic in a report entitled, “Predicts 2015: A Step Change in the Industrialization of Advanced Analytics”. This has very interesting and important implications for all companies aspiring to become more of a digital business. The report states that failure to do so impacts mission-critical activities such as acquiring new customers, doing more cross-selling and predicting failures or demand.

shutterstock_167204534Specifically, business, technology and BI leaders must consider:

  • Developing new uses cases using data as a hypothesis generator, data-driven innovation and new approaches to governance.
  • Emergence of analytics marketplaces, which Gartner predicts will be more commonly offered in a Platform as a Service model (PaaS) by 25% of solution vendors by 2016
  • Solutions based on the following parameters: optimum scalability, ease of deployment, micro-collaboration and macro-collaboration and mechanisms for data optimization
  • Convergence of data discovery and predictive analytics tools
  • Expanding technologies advancing analytics solutions: cloud computing, parallel processing and in-memory computing
  • “Ensemble-learning” and “deep learning”. The former defined as synergistically combining predictive models through machine-learning algorithms to derive a more valuable single output from the ensemble. In comparison, deep learning achieves higher levels of classification and prediction accuracy through the development of additional processing layers in neural networks.
  • Data lakes (raw, largely unfiltered data) vs data warehouses and solutions for enabling exploration of the former and improving business optimization for the latter
  • Tools that bring data science and analytics to “citizen data scientists”, who’ll soon outnumber skilled data scientists 5-to-1

Leaders in the emerging analytics marketplace, include:

  • Microsoft with its Azure Machine Learning offering
    • For further info, check out: https://blogs.perficient.com/microsoft/2014/12/azure-ml-on-the-forefront-of-advanced-analytics/
  • IBM with its Bluemix offering

Finally, strategy and process improvement, while being fundamental and foundational, aren’t enough. The volume and complexity of big data along with the convergence between data science and analytics requires technology-enabled business solutions to transform companies into effective digital businesses. Perficient’s broad portfolio of services, intellectual capital and strategic vendor partnerships with emerging and leading big data, analytics and BI solution providers can help.

Cloud MDM – Reference Architecture for Public Cloud Platform

Moving the MDM deployment to a cloud platform involves several considerations ranging from technical gotchas to getting the internal buy-in. Getting the buy-in is a different topic of discussion – see my earlier posts on Data Governance.

The advantages of moving to cloud eliminates some of the administrative and server farms related maintenance tasks but does not eliminate the overall responsibility. The apprehension of moving to cloud, especially the critical enterprise data like Customer Master or Product Master, can be daunting. Product Master would be an easier concept to sell than the customer master. Though, in reality, a lot of the customer information resides on the cloud already (think Salesforce).

public_cloud_arch

Key Considerations (Product Master Data)

  • Product MDM tool on the cloud can consolidate, enrich, improve quality and publish to consuming applications but the catalog management very well be within the firewalls of the Enterprise.
  • Provisioning has to be well-thought-out and synchronizing the Enterprise user accounts with cloud can be a challenge.
  • Avoid the IT only approach and involve Data Governance to make sure business benefits are realized.
  • Synchronization of published data with the applications and the timeliness has to be addressed.

This architecture depicts three types of applications.

  1. Sources (Applications) which create/modify Master Data and receives Master Data with the ability to synchronize (upsert) the data from the Master Data Hub.
  2. The second type which can be a source of Master Data and consumes the latest Master Data for reference but no need for synchronization.
  3. The third category of Applications which are purely subscribers (consuming) applications which may include external partners.

Providing the Product Master on the cloud makes sense especially for publishing to customer-facing portals and to third-party subscribers like partners.