Several weeks ago, I was watching a cable network news channel and the news anchor was discussing what it will take to win the next presidential election. As he discussed it in more detail, he commented about how there will need to be much better analysis of data to pinpoint the right segment of the population with the right messages. Then, he said it. Yes, he actually said “Big Data” and how successful campaigns will need to be able to analyze big data. WOW!!! Everybody is getting in on this. It has quickly become a household term….Well not exactly, but we are certainly hearing more and more about it.
Personally, I’m glad. Because I’ve always believed that it’s all about the data. Get the data right and everything else is made easier. With the right data at the right time, it can be turned into information that can be used to make better decisions. It doesn’t matter if it is to determine which way you drive to work today or if a company should acquire or merge with another company. Good information makes for better decisions. It’s a lot easier to win customers and gain revenue with it than without it. Right? Big Data has exposed us to a whole new opportunity to help us make better decision, but it can come at a big cost.
So now that everybody (or a lot more people) are talking about Big Data (due to the explosion of data on the internet through social media, device sensor data, etc.), let’s expose some of the interesting and fascinating things about the integration of Big Data.
The ability to get your arms around this massive amount of data (have you heard of Yotta yet?) and turn it into useful information is still in its infancy, but it is quickly maturing. Just like any other concept at this point in its evolution (yes, I was around when Al Gore invented the internet), there is a lot of discussion, a lot of experts, and a lot of fumbling around in the dark to figure this out. Two of the big topics around Big Data are integration and cost.
A lot of companies spend way more time and money integrating and cleansing data (getting it in a usable form) than actually using it to improve their decision making (Analysis). Approximately 80% of the work in big data projects is data integration and data quality. Of course this problem existed way before anyone ever thought about Big Data. Plus, Big Data has been around way before anyone even coined the phrase…..but I digress; back to the point.
With Big Data Integration, this can be even more costly than all the other forms of data in your organization; mainly because we are still trying to figure this out and for the most part there are a lot of open source solutions that are starting to materialize out there. This is typical in the early stages of maturity. So you’ve heard the term Pig (short for Pig Latin) and Hive developers. And you heard the term MapReduce and I’m not even going to mention Hadoop. There’s and evolution (not revolution) going on out there. So here’s the point. How do we get a better handle on integrating Big Data and how do we do it without having to pay a Pig / Hive developer $300/hr. to code something who will now need to be kept around forever to support it?
Informatica is a company that has focused their entire business around data integration. That’s what they do. So now they have a solution around Big Data Integration; taking this data and integrating it around what you know (your point of reference), what your company is doing. They do this by leveraging their foundational product (PowerCenter) and incorporating a component that integrates with Hadoop so that the burden of hand coding a Pig or Hive solution is greatly reduced or eliminated. If a company already has Informatica, they could possibly leverage some of the talent and experience that they already have to start making since out of this cesspool of Big Data that’s out there. And maybe they could save the expense of hiring the Pig guy. What do you think?