Just returning from IBM’s Information On-Demand 2012 conference in Vegas last week, where there were just as many new questions created as ones that were answered. Among the usual Vegas question like; What happened to my money and Where am I, were some new ones. Most common was; What is Big Data? So it appears there’s still some market education needed to clarify the definition. Whenever that question arose, the conversation always seemed to come around ultimately to what the challenges are to managing high volume data streams. One Perficient client I spoke to, a large Southern California utility, is dealing with massive influx of new data streams from their Smart Meter/Grid deployment projects. As we talked I was struck by how immense the volume of information was, how much was being discarded, and how much potential their was for the data – Good & Bad. Clearly the challenges of big data are real. First off, definitions are as diverse as opinions. Most organizations don’t differentiate “big data” from traditional data. In fact, in recent study done by Information week nearly 90%of respondents surveyed use conventional databases as the primary means of handling data. With the help of the Information Week research, hopefully we can better understand what constitutes big data (it’s not just size) and the challenges it poses.
The Information Week survey revealed that the top big data sources were financial transactions, email, imaging data, Web logs, and Internet text and documents–all common data sources. It’s clear, you don’t need to be a massive utility company deploying smart grid technology to be inundated with huge volumes of data, and if it isn’t a challenge for you now, it will be very soon. Any business creating large data sets will need to imbed big data management practices and the right tools and architectures, or they won’t be able to effectively use the information collected.
So what is big data? It’s more than just volume. Generally four elements are required to qualify as big data. The first is the size; 30 TB is a good starting point. Second is type of data. Big data involves several types—structured, unstructured and semistructured. Third is latency. Big data changes fast and creates new data that needs to be analyzed quickly. Fourth is complexity. Characteristics of complex data include large single log files, sparse data and inconsistent data.
Now that we’re zeroing in on the definition and structure, or lack there of for these growing forms of data, the next question is do you have a strategy in place to deal with it differently then you deal with more traditional forms of data. According to Information Weeks research of over 200 technology leaders over half said “NO”, which likely means that if you’re reading this you probably don’t either. Don’t worry though, you’re not alone. 87% of respondents are still using databases as the primary method to handle data.
Complicating the management challenges to big data are the various approaches to managing the data based on sources and structure. The stream processing approach involves almost every aspect of computing, including processing ability, network throughput, storage and visualization. The majority of the Information Week survey participants expressed concerns about access to data, storage and analytics when it comes to this approach. Most were divided between those that need real-time processing of big data and those that don’t. Real-time processing can be a challenge with big data, especially in dynamic data environments. The batch processing approach to big data is designed to manage information as it grows and expands over time. Organizations that deal with this type of data are turning to the Hadoop model and software to rapidly process significant amounts of data. Hadoop is being used for some very big implementations. According to Information Week, Facebook was the largest Hadoop deployment in the world with more than 20 PB of storage. By March of 2012, it had grown to 30 PB—3,000 times, the size of the Library of Congress. There are two problems in using Hadoop . First, you don’t get partial answers. You have to wait, sometimes a long time, for the entire batch to finish. Second, it can require a lot of hardware, because all data is processed at once. Which means any change in data requires the entire batch to be rerun. The only way to deal with this is to apply more hardware, which can be costly.
Besides the various management approaches and inconsistent market definitions, there are some other hurdles that companies should be on the look out for. According to Information Week’s research, almost half of the participants 44% indicated that they lacked the knowledge needed to implement and manage big data solutions. More than half 57% noted budget as the biggest barrier.
With traditional forms of data management rapidly approaching capacity due to the deluge of new forms of information and sources, the market is approaching a looming crossroads. More and more business will be faced with a lack of knowledge resources needed to tap the vast wealth of information available to them. The tools are there and the information is clearly there for the taking. Organizations willing and able to invest in big data resources will eventually gain a greater competitive advantage over those that don’t.
With so much at stake with such a complex solution set, companies will be looking to eliminate as much risk as possible from these projects. Learning from peers, listening to analyst insights, understanding costs and accessing the best services, mentoring, and training solutions is a critical prerequisite to project and on-going management success of big data. Doing it right the first time means tapping into providers that have experience in a wide range of technology options for big data. Perficient’s diverse range of technology partnerships, and extensive training capacities are leading many earlier adopters to Trust us with their company’s most precious resource, information. Our industry experience and partnership awards are testaments to our delivery quality.
Several of the research points on big date made here are pulled from Information Week’s “Big Data Management Challenge” research report from April of 2012. The report is available for free and is an interesting read for anyone looking to understand more about big data.