As we head towards the end of the first quarter of 2012, there can be no doubt that concept of Big Data has arrived. But what is Big Data? In 1985 a PC with a 10 megabyte disk drive was state of the art; by 2010 1 terabyte drives were common-place. Is this Big Data? Ten years ago 10 to 20 terabytes was the high-end of commercial databases, today petabyte databases are not unheard of. Is this Big Data? While addressing one aspect of Big Data, they are missing two or more of the other commonly accepted dimensions of Big Data.
What are the dimensions of Big Data? In 2001, The Meta Group published a report that described the challenges that traditionally data management faced. In the report, the described the three terms that have become widely accepted as defining Big Data – Volume, Velocity, and Variety. The major software vendors have all accepted the definition and use it when describing the products whether it be SAP’s HANA, IBM’s Big Insights, or Oracle’s Big Data Appliance.
Data is being generated in larger and larger quantities every day. It comes not only from human sources but, more and more, it is machine generated. Advances in healthcare generate masses of patient data, smart meters flood energy companies with usage information, process manufacturers monitor every phase of their production.
Big Data comes in one size: large.
Not only is Big Data generated in large volumes, it is coming at us quickly. In many cases, it must be acted upon just as quickly to have value. Twitter’s tiny tweets generate many terabytes of data every day. Large organizations must mine and react to the information quickly to avoid unwanted publicity.
Big Data arrives at one speed: fast.
Big Data includes data that cannot be easily described by the classic row-column, record-field paradigms. It includes unstructured data in all its many forms text, audio, video, streams, log files and more. As new products and services are envisioned new data types will be created and more Big Data will be generated.
Big Data manifests itself in one format: mixed.
Beyond the standard three descriptive dimensions, some proponents are suggesting other dimensions for Big Data – Value and Validity.
Value is surely a meaningful way to measure Big Data. There is value to be released from the masses of data we own; the challenge is creating our own environmentally acceptable “fracking” processes to release it.
The Validity of data is always of interest. Invalid “Small Data” can cause huge issues for business users. Imagine the effect of validity on Big Data. However, with Big Data the issue is not necessarily whether the data is wrong or right but whether we are arriving at the right or wrong conclusions as we analyze and consume it.
Big Data is the challenge-of-the-moment facing us all. It is something we can address. There are tools and technologies coming to bear that will help us to manage and leverage Big Data. Hadoop and MapReduce are new players in the field. They address the Volume and Variety dimensions. Traditionally Data Warehousing manages the Volume and Velocity dimensions very well. As time progresses, the marriage of these, and other technologies, will make Big Data old hat.