There was a recent post in the HBR blogs that stated that ‘Success comes from better data, not better data analysis’.
http://blogs.hbr.org/cs/2011/08/success_comes_from_better_data.html
While this sounds cliche, it is a fact and we tend to ignore the value of quality data. Nowadays firms invest on hiring some of the best analysts in the industry with the hope of crunching numbers better than their competitors and gain the competitive edge. Firms want to make use of the best BI tools available in the market to get more insights about their data. But how often do we see that their focus is not quite there in having quality data.
From what I’ve seen from a couple of my recent projects, there is a negligence in maintaing data quality. People are more worried about how the tools and techniques used to utilize the data to provide insightful statistics rather than understandning that one of they key requirements for any data analysis to yield good results is good, consistent data!
I always believe that it is worth spending that little extra time in data cleaning and ensuring consistency before jumping into using the various BI tools to play with the data. It’s important to ensure your data with all the past statistics is maintained well and kept consistent. Ultimately, if the underlying data itself is flawed or inconsistent, the analysis is surely going to be flawed no matter how good you’re analysts are or the tools that they use!
Interesting take. Though I’m not entirely convinced on the one sidedness of Mr. Morey’s article. In the end it’s a bit of both data quality/good analysis.
The issue with striving for better quality is that it’s a lot like expensive preventative medication. If it works in 97% of the cases but then there’s a miss, people will complain that “See? I spent all that money and I STILL have the problem!” or even if it’s 100% successful most people will say “See? I was fine after all and I didn’t need to spend all that money.”
Same goes with good analysis. If you do a good job someone would look at that and say “See? Didn’t need good analysis because the data is so good.” Just like in his scenario where he points out the stats are only valuable against the entire league. Well, really that’s a finding made by someone doing data analysis (even if it’s Mr. Morey).
The sword cuts both ways. Accordingly it should be employed both ways.
Good data is like music and analysis the musician. A good musician can play a poorly written piece wonderfully but they will never make it seem well written. Likewise a bad musician can take a wonderfully written piece and make it sound terrible. In the end it doesn’t really matter how well the music is written if the musician can’t read music.
@Ron – I definitely agree about your view that there needs to be a good mix of both data quality and good analysis. I did feel Mr. Morey’s article was a bit one-sided as well when I first read it. But I can’t help but feel that companies these days invest heavily in hiring top analysts and in utilizing the best technologies and in the bargain lose focus on the underlying data! To put it plain it is like ‘garbage in’ – ‘garbage out’. To get the best out of your analysts and tools, there needs to be more emphasis shown in keeping your data clean and consistent.
All the great analysis goes otu the window if you ask three people/organizations what the answer is and get three answers that don’t match. One may be right, they may all be close, but the business won’t trust them and will pick the ones they want and discount the others.
Need to agree on data definitions and meaning, need to agree on data sources and need good analytics. Plus, when filtering the data, like most analytics do, need to proveide the context so people understand if they are looking at a view of data and is that view consistent.
Net is people need to trust the data and that starts with good data quality and some data governance around the data assets.
Hello,
For me the quality is fondamental. Of course, you can grow flowers with dirt, but not a whole garden…
The problem with BI is once a data is in the pipe, it is difficult to pull it out. If you forget something, or make some mistake, you will need a lot of time and effort to identify and get rid of the problem.
You can be a master analyst, but you can’t fill in the gap if data is missing.
You can guess though, but you always will have a doubt and so your analysis would be tainted. How can then be confident in the analysis and the result if we doubt of the data ? And this will be more certain if the result of the analysis is not what we expected…
Then of course, you will need good analysis tools and skills and know the business you’re working on.
I remember a dashboard that I made some years ago about the whole process of buying a car. In Uk for exemple it’s common to test a car before buying it. So the document had been made in function of that caracteristic. When applying that scheme for Belgium and Greece, an alert had been raised and data was considered bad. Well it is not common in Belgium or Greece to test a car before buying it. So the result of the analysis would have been wrong even with correct data because the business analyst didn’t know about that fact.
You can have a ferrari, that doesn’t make you a schumacher…
There are several cause to bad data. I will point 2 of them. First in my experience, a lot of problem comes from excel sheets sources. Excel is not a database and has no real constraints and so permits a lot of dirt. You have no control over data put into excel sheets.
Second, poorly designed applications. In order to win programmation time, often we loose some control, the initialisation of data is not good (null is not a great value for BI, is it a fault data or an intend value ? How can we know when we get the data at the end of the chain ?),…
Obviously better data is a good thing, especially for good analysis. This is one of the main reasons that ‘data discovery’ is so important, because it also helps locate bad data that can not really be associated with anything meaningful (other than which program is allowing the creation of bad data). But even if all of the data is good (clean), the most important part of business intelligence and data analysis is asking ‘good questions’. Your answers will only be as good as the questions you ask. You must look at the data in a meaningful way, in order to learn something meaningful. So my 2 cent answer is ‘no, data quality is not necessarily a key pre-requisite for any BI technique’. There will probably always be some bad data, but proper use of good BI tools should always yield some business value.
Stephen, I agree about your point that we need to agree upon data definitions and sources. And definitely there needs to be better data governance in firms – the IT and business will have to work together to achieve this. Ultimately, you don’t want to lose out on some critical points in your analysis because of flawed data.
Olivier – thanks for bringing out these valuable points. I’ve so often experienced your point of losing a lot of time redoing things within a BI tool because something changed in the underlying data! And yes, null can be a big problem in BI tools – esp when your data source is something like Oracle tables which have NULL.
Brian – Asking good questions is just a starting point in your analysis. While your BI tools / analyst can do something about say duplicate records or missing values, you can’t expect the tool to beautify flawed data and expect miracles in your results. One cannot make assumptions at every stage and produce a flawless analysis no matter how big an expert he is.