Perficient Enterprise Information Solutions Blog

Blog Categories

Subscribe via Email

Subscribe to RSS feed


Follow our Enterprise Information Technology board on Pinterest

Posts Tagged ‘data quality’

Is IT ready for Innovation in Information Management ?

Information Technology (IT) has come a long way from being a delivery organization to an organization part of business innovation strategy, though a lot has to change in the coming years. Depending on the industry and the company culture, IT organization will mostly fall in the operational spectrum and a lot of progressive ones are  gravitating towards innovation. Typically, IT maybe consulted on executing the strategic vision. It is not IT’s role to lead the business strategy but data and information is another story.  IT is uniquely positioned to innovation in Information Management because of their knowledge in data, if they don’t take up that challenge, business will look for outside innovation. Today’s market place offers tools and technologies to business users and they are bypassing IT organizations if they are not ready for the information challenge. A good example will be business users trying out third-party services (cloud), self-service BI tools for slicing and dicing data, cutting down the development cycle. The only way IT can play strategic game is to get into the game.

It is almost impossible for IT not to pay attention to data and just bury their heads in keeping the lights on projects. So I took a stab at the types of products and technologies which is maturing in the last 5 years in the Data Management space. By any means this is not the complete list but it captures the essence.DM_tools_x

Interesting phenomenon is many companies traditionally late to adopt data driven approach are using analytical tools as they become visually appealing and are at a price they can buy. Cloud adoption is another trend which is making the technology deployment and management without a huge IT bottleneck.

The question every IT organization, irrespective of company size, should ask is Are we ready to take on the strategic role in the enterprise? How well they can co-lead the business solution and not just implementing an application after the fact. Data Management is one area IT needs to take the lead in educating and leading innovation to solve business problems. Predictive analytics and Big Data is right on top with all the necessary supporting platforms including Data Quality, Master Data Management and Governance.

It will be interesting to know how many IT organizations leverage the Information Management opportunity.



Primary Practices for Examining Data

SPSS Data Audit Node





Once data is imported into SPSS Modeler, the next step is to explore the data and to become “thoroughly acquainted” with its characteristics. Most (if not all) data will contain problems or errors such as missing information and/or invalid values. Before any real work can be done using this data you must assess its quality (higher quality = more accurate the predictions).

Addressing issues of data quality

Fortunately, SPSS Modeler makes it (almost too) easy! Modeler provides us several nodes that can be used for our integrity investigation. Here are a couple of things even a TM1 guy can do.

Auditing the data

After importing the data, do a preview to make sure the import worked and things “look okay”.

In my previous blog I talked about a college using predictive analytics to predict which students might or might not graduate on time, based upon their involvement in athletics or other activities.

From the Variable File Source node, it was easy to have a quick look at the imported file and verify that the import worked.










Another useful option is run a table. This will show if field values make sense (for example, if a field like age contains numeric values and no string values). The Table node is cool – after dropping it into my stream and connecting my source node to it, I can open it up and click run (to see all of my data nicely fit into a “database like” table) or I can do some filtering using the real-time “expression builder”.















The expression builder lets me see all of the fields in my file (along with their level of measurement (shown as Type) and their Storage (integer, real, string). It also gives me the ability to select from SPSS predefined functions and logical operators to create a query expression to run on my data. Here I wanted to highlight all students in the file that graduated “on time”:












You can see the possibilities that the Table node provides – but of course it is not practical to visually inspect thousands of records. A better alternative is the Data Audit node.

The Data Audit node is used to study the characteristics of each field. For continuous fields, minimum and maximum values are displayed. This makes it easy to detect out of range values.

Our old pal measurement level

Remember, measurement level (a fields “use” or “purpose”)? Well the data audit node reports different statistics and graphs, depending on the measurement level of the fields in your data.

For categorical fields, the data audit node reports the number of unique values (the number of categories).

For continuous fields, minimum, maximum mean, standard deviation (indicating the spread in the distribution), and skewness (a measure of the asymmetry of a distribution; if a distribution is symmetric it has a skewness value of 0) are reported.

For typeless fields, no statistics are produced.

“Distribution” or “Histogram”?

The data audit node also produces different graphs for each field (except for typeless fields, no graphs are produced for them) in your file (again based upon the field’s level of measurement).

For a categorical field (like “gender”) the Data Audit Node will display a distribution graph and for a continuous field (for example “household income”) it will display a histogram graph.

So back to my college’s example, I added an audit node to my stream and took a look at the results.











First, I excluded the “ID” field (it is just a unique student identification number and has no real meaning for the audit node). Most of the fields in my example (gender, income category, athlete, activities and graduate on time) are qualified as “Categorical” so the audit node generated distribution graphs, but the field “household income” is a “Continuous” field, so a histogram was created for it (along with the meaningful statistics like Min, Max, Mean, etc.).














Another awesome feature – if you click on the generated graphs, SPSS will give you a close up of the graph along with totals, values and labels.


I’ve talked before about the importance of understanding field measure levels. The fact that the audit data node generates statistics and chart types are derived from the measurement level is another illustration of how modeler uses the approach that measurement level determines the output.


IBM Vision 2013 – 2 thumbs Up!


I just returned from the IBM Vision Conference in Orlando, Florida. I attended a session in every available timeslot from Monday morning to Wednesday afternoon and it was worth every single minute of my time!

Although there were too many sessions and presenters to mention, here are my “top picks”:

  • Designing Solutions with IBM Cognos TM1 Performance Modeler – Andy Neimens and Stephen Brook. This session took a case study approach to building a planning and analysis solution using the Performance Modeler tool. If you have been reading my blog posts, you know I am in love with this tool. If anyone is still out there thinking that it’s acceptable to develop TM1 solutions using only TM1 Architect, and are not steadily building an expertise with PM, you are going to be left behind!


  • Reducing Cost -through Predictive Analytics by integrating IBM SPSS and IBM Cognos BI = JBS International. This was a “deep-dive” discussion on sourcing data from disparate systems and files to use with SPSS Statistics and SPSS Modeler to uncover relationships between financial performance and business objectives.  Again as you know, SPSS is a passion of mine and the PHD’s at JBS demonstrated their expertise with the technology. I spent time on break talking to these guys and trying to absorb their every word.


  • Building GRC (governance, risk and compliance) Success with the Power of Customer Experience -= Chris McClean of Forrester Research. This session explored how GRC programs can employ best practices from customer experiences to create an environment where employees want to participate with the program and embed it into their standard operating procedures.  This was an interesting presentation with real world examples for a vision for improving your GCR programs. It refreshed and renewed my commitment to the GRC programs I’ve help develop and support throughout my career.


  • Data Quality and Analytics – Which is the chicken and which is the egg? – Tony Petkovski, Commonwealth Bank of Australia. In this session, Tony demonstrated how his bank is using IBM’s OpenPages GRC platform to drive quality analytics and support better reporting and decision making, driving down the banks risks. Tony is such a passionate and charismatic guy that I left wanting to transfer all my money to the commonwealth bank!


  • Delivering Stronger Business Insight through a CFO Dashboard – Tony Levy, IBM.  This was a demonstration of IBM’s Smarter Analytics “Signature Solution” that leverages TM1, Cognos BI and SPSS Modeler to deliver a CFO dashboard that visualizes in NRT (near real time) KPI’s and KRI’s. – Walking out of this session I thought, I now know what I want to be when I grow-up!!!. Tony presented this new “add in product” for customers using these technologies – a configurable and customizable tool that will blow you away. Again, as you may or may not know, as an technology implementer, I design and help build these kinds of solutions all of the time. It’s always nice to see something like this. TM1, Cognos BI and SPSS Modeler? That has to be the “perfect storm”.


  • Vernice “fly-girl” Armour –America’s first African American female combat pilot. Last (but not in the least) I absolutely enjoyed attending the Tuesday Keynote presentation by Vernice Armour. She is so compelling and inspiring. She’s written a book (which I plan to pick up this week) “Zero to Breakthrough: The 7-Step, Battle-Tested Method for Accomplishing Goals that Matter”.  My favorite quote – “helicopters don’t need a runway”…Vernice: “Engage hot”! 


Thank-you IBM for another great conference and I hope to see you again next year!


Special thanks to my friends at Perficient for my ticket!


Introduction to Data Quality Services (DQS) – Part I

I was recently introduced to SQL Server 2012 and discovered Data Quality Services (DQS); a new feature of SQL Server 2012.  I wanted to use this blog as an introduction to DQS, define key terms, and present a simple example of the tool.  According to MSDN,

The data-quality solution provided by Data Quality Services (DQS) enables a data steward or IT professional to maintain the quality of their data and ensure that the data is suited for its business usage. DQS is a knowledge-driven solution that provides both computer-assisted and interactive ways to manage the integrity and quality of your data sources. DQS enables you to discover, build, and manage knowledge about your data. You can then use that knowledge to perform data cleansing, matching, and profiling. You can also leverage the cloud-based services of reference data providers in a DQS data-quality project.

(Click on each image to enlarge it.)
The below illustration displays the DQS process:

Read the rest of this post »

Teradata Talks Enterprise Data Integration

Teradata has been long been known for its powerful data systems and drive to push benchmarks for large data volumes. In fact, in 1992 Teradata built a first of its kind system for Wal-Mart, capable of  handling 1 terabyte of data. One of the main advantages to routinely working with very large data sizes is the exposure to integration, data quality (DQ) and master data management (MDM) techniques. From the experience derived after years of this type of work, Teradata has found themselves in a position as experts on the topics. With that, here is a video that includes these buzzwords and more as Teradata describes how to achieve data integration at the enterprise level:

Back to the Basics: What is Big Data?

This video published by SAP provides a concise description of Big Data. Timo Elliott (SAP Evangelist) and Adrian Simpson (CTO, SAP UK& Ireland) describe the 4 major challenges that big data is comprised of:

  • Volume – Amount of data
  • Velocity – Frequency of change in data
  • Variety – Both structured and unstructured data
  • Validity – Quality of the data

Before we can push into the details we must first understand the most simplified form of the topic as a platform to build from. Enjoy:

Read the rest of this post »

Data Governance – a must-have to ensure data quality – Part 2

In Part 1, we saw an overview of Data Governance and the initiatives firms need to take to incorporate governance. Let’s now look a bit more in detail about Data Quality Management as this is a key step in Data Governance towards ensuring data quality.

Why is Data Quality Management necessary?

Data Quality Management is the process of establishing roles & responsibilities and the business rules that govern data by bringing the Business and IT to work together. Their task is two-fold:- to address the problems that already exist and to prevent the potential ones from occurring. Let’s focus on the roles & responsibilities as this forms the core of a Data Quality Management program.

Roles & Responsibilities

There are various roles involved in this process and all of them have to be accountable to ensure data quality. Its vital that the roles are clearly defined upfront. The following are some of the commonly recognized roles:-

  • Data Governance Council – comprises of an Information Management Head and Data Stewards from various units.
  • Information Management Head – is the one who is accountable to the Governance Council on all aspects of data quality. This role would typically be fulfilled by the CIO.
  • Data Stewards - are the unit heads who lay down the rules & policies to be adhered to by rest of the team. This role would usually be fulfilled by a Program Manager.
  • Data Custodians – are responsible for the safe storage & maintenance of data within the technical environment. DBA’s would normally be the data custodians in a firm.
  • Business Analysts – are the ones who convey the data quality requirements to the data analysts.
  • Data Analysts – are those who would reflect the requirements into the model before handing it over to the  development team.


Some best practices to successful data governance 

This article on talks about some of the best practices around successful data governance. They key steps include:-

  • Get a governor and the right people in place to govern
  • Survey your situation
  • Develop a data-governance strategy
  • Calculate the value of your data
  • Calculate the probability of risk
  • Monitor the efficacy of your controls

While it is quite difficult to implement a data governance program, there is little doubt about the value addition it gives. Often companies tend to look at it just from the number of personnel involved and immediate ROI’s without looking at it from a broader perspective. Ultimately it is your own data that makes you stand out from your competitors. Ensuring data quality will automatically result in getting better insights from your analysis. Technology will always be a valuable enabler when there is a strong data governance program tied with it!

Data Governance – a must-have to ensure data quality – Part 1

While one of my earlier posts on Quality Data being a pre-requisite for every BI technique is still generating both positive and negative responses, I felt it would be apt to delve into Data Governance and see why it is necessary to be incorporated to achieve & maintain better data quality.

First, lets have a quick overview of data governance.

What is Data Governance?

Wikipedia defines Data Governance as a set of processes that ensure key data assets are formally managed throughout the enterprise so that the data can be trusted and people can be made accountable for any adverse event that occurs due to bad data. Data Governance is essentially a quality control discipline mainly meant to improve and maintain the data quality.

Data Governance Objectives

  • Improve decision-making of the management
  • Ensure data consistency
  • Build trust of data among everyone involved in the process
  • Adhere to compliance requirements
  • Eliminate risks related to data

Data Governance Pillars

It’s important to realize that data quality is just one of the pillars of governance. Typically Data Governance comprises of the following pillars:-

  • Metadata Management – It involves storing information about your data by means of a metadata repository.
  • Master Data Management (MDM) – It is a process of collecting and aggregating all the data within the firm into a single master file (acts as a reference) to ensure consistency.
  • Data Quality Management – It involves setting up of roles, responsibilities & governing business rules by bringing the Business and IT together with the focus on data quality.
  • Data Security – As the name indicates, it provides data access to only authorized users and protect it from unauthorized users and other threats.

Data Governance Initiatives

While there are quite a few data governance frameworks (like DMBOK, COBIT etc) out in the industry which firms can adopt, the following points could provide some first steps:-

  • Data Governance Vision Statement
  • Analyze & define data quality levels to be able to monitor performance
  • Establish roles & responsibilities by collaborating Business and IT
  • Setting up a Stewardship model to ensure data ownership & eliminate risks

Though it can take a considerable amount of time and effort to set up Data Governance – there is no doubt that it is going to improve the overall process of running your business.

In Part 2, we’ll look specifically about Data Quality Management and some best practices towards achieving data quality.

Quality Data – a key pre-requisite for any BI technique

There was a recent post in the HBR blogs that stated that ‘Success comes from better data, not better data analysis’.

While this sounds cliche, it is a fact and we tend to ignore the value of quality data. Nowadays firms invest on hiring some of the best analysts in the industry with the hope of crunching numbers better than their competitors and gain the competitive edge.  Firms want to make use of the best BI tools available in the market to get more insights about their data. But how often do we see that their focus is not quite there in having quality data.

From what I’ve seen from a couple of my recent projects, there is a negligence in maintaing data quality. People are more worried about how the tools and techniques used to utilize the data to provide insightful statistics rather than understandning that one of they key requirements for any data analysis to yield good results is good, consistent data!

I always believe that it is worth spending that little extra time in data cleaning and ensuring consistency before jumping into using the various BI tools to play with the data. It’s important to ensure your data with all the past statistics is maintained well and kept consistent. Ultimately, if the underlying data itself is flawed or inconsistent, the analysis is surely going to be flawed no matter how good you’re analysts are or the tools that they use!

Tags: ,

Posted in News

Key Elements of a Solid BI and Data Management Framework

I was struck by a blog post written by Forrester’s Rob Karel (@rbkarel), titled “How Complete are your BI and Data Management Strategies?

This is the type of answer that our business intelligence consultants ask our clients every day, and we often come up with a framework as complete and strategic as the ideal one described in the recent report that Karel’s blog post references.

Here are some of the key elements of a solid BI / data management framework that I took away from the report:

  1. Predict solid ROI via a business case that will demonstrate one or more of these:
    • reduce costs
    • increase revenue
    • differentiate the company
    • reduce risk
    • drive efficiency
  2. Deliver scalability, flexibility and agility
  3. Gain adoption across the organization, by executives and key stakeholders who will be leveraging the system/data
  4. Anticipate governance, legal, compliance, and/or architectural challenges
  5. Include a change management plan
  6. Carefully evaluate and select software, hardware and service providers, and don’t forget to negotiate with them
  7. Measure and monitor progress and ROI along the way, after release
  8. Plan for updates, optimizations, and tweaks to the program
  9. Research best practices from your peers as well as case studies in line with the approach you want to take
Overall, driving your data management strategy back toward #1 is what Forrester recommends, and rightly so.
“..Align your initiative with the business value those strategies deliver. To produce a streamlined and effective program, maintain a focus on the business processes and decisions it’s enabling — not just the data.”