Data Quality – Don’t Fix It If It Ain’t Broke / Blogs / Perficient

What is broke? If I drive a pickup truck around that has a small, unobtrusive crack in the windshield and a few dings in the paint, it will still pull a boat and haul a bunch of lumber from Home Depot. Is the pickup broke if it still meets my needs?

So, when is data broke? In our legacy data integration practices, we would profile data and identify all that is wrong with the data. Orphan keys, in-appropriate values, and incomplete data (to name a few) would be identified before data was moved. In the more stringent organizations data would need to near perfect for it to be used in a data warehouse. This ideal world or perfect data was strived after, but rarely obtained. It was too expensive, required too much business buy in, and lengthen BI and DW projects.

Revolutionize Your Business With Generative AI

From product design and software development to virtual agents, content creation, and reporting, GenAI is transforming business. Our AI experts help you unlock GenAI’s full potential and drive growth.

Let’s Get Started

In the world of Big Data, things have changed. We move data first and some of it we may never fix. Why? To understand this we need to look at the analytical process. When performing an analytical project, data scientists will usually select a subset of data, split it into halves. One half of the data is used to build a model; the other is used to test the model. If the model tests OK, that is the standard error is within acceptable range, do we need to fix the data? Fixing the data would at this point not change the outcome, so it served its purpose.

With moving the data first and moving it into a data lake for processing this gives us a unique opportunity to test drive the data first. Data scientists and business users will be able to benefit from using the data to make better decisions. At a time that the data quality does not meet the needs, address the issues within the data. So, don’t fix it if it ain’t broke.

Follow Bill on Twitter @bigdata73

Connect with Perficient on LinkedIn here.

Thoughts on “Data Quality – Don’t Fix It If It Ain’t Broke”

Pat Hennel February 18, 2015 at 10:00 am

I think the key is figuring out what data MUST be 100% clean and what data you can afford to be a little unsure of. Certain data sets that affect big business decisions need to be as perfect as possible at all times so you know you are making the right decisions based on real information.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Data Quality – Don’t Fix It If It Ain’t Broke

by Bill Busch on January 27th, 2015 | ~ minute read

Revolutionize Your Business With Generative AI

Tags

Thoughts on “Data Quality – Don’t Fix It If It Ain’t Broke”

Leave a Reply

Bill Busch

Categories

Follow Us