Data Quality – Don’t Fix It If It Ain’t Broke - Perficient Blogs
Blog
  • Topics
  • Industries
  • Partners

Explore

Topics

Industries

Partners

Data Quality – Don’t Fix It If It Ain’t Broke

What is broke?  If I drive a pickup truck around that has a small, unobtrusive crack in the windshield and a few dings in the paint, it will still pull a boat and haul a bunch of lumber from Home Depot. Is the pickup broke if it still meets my needs?

pickup truck haulingSo, when is data broke? In our legacy data integration practices, we would profile data and identify all that is wrong with the data. Orphan keys, in-appropriate values, and incomplete data (to name a few) would be identified before data was moved. In the more stringent organizations data would need to near perfect for it to be used in a data warehouse. This ideal world or perfect data was strived after, but rarely obtained. It was too expensive, required too much business buy in, and lengthen BI and DW projects.  

In the world of Big Data, things have changed. We move data first and some of it we may never fix. Why? To understand this we need to look at the analytical process. When performing an analytical project, data scientists will usually select a subset of data, split it into halves.  One half of the data is used to build a model; the other is used to test the model. If the model tests OK, that is the standard error is within acceptable range, do we need to fix the data?  Fixing the data would at this point not change the outcome, so it served its purpose.

With moving the data first and moving it into a data lake for processing this gives us a unique opportunity to test drive the data first. Data scientists and business users will be able to benefit from using the data to make better decisions. At a time that the data quality does not meet the needs, address the issues within the data. So, don’t fix it if it ain’t broke.

Follow Bill on Twitter @bigdata73


Connect with Perficient on LinkedIn here

One thought on “Data Quality – Don’t Fix It If It Ain’t Broke

  1. I think the key is figuring out what data MUST be 100% clean and what data you can afford to be a little unsure of. Certain data sets that affect big business decisions need to be as perfect as possible at all times so you know you are making the right decisions based on real information.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to the Weekly Blog Digest:

Sign Up