Skip to main content

Data & Intelligence

The World of Hadoop

When it comes to Big Data, we’ve talked about the significance of Hadoop, but it is still a mystery to many people. In fact, even among IT experts, there have been multiple and conflicting interpretations of what it is. While all of its complexities may not be unraveled, an article by Derrick Harris on GigaOM.com gives us a good overview of the Hadoop environment. There are the components of Hadoop itself, and then, there is the software that makes use of Hadoop, whether through enabling the writing of Hadoop applications or assisting in the analysis of data stored within Hadoop.

Harris says that Hadoop, as an Apache Software Foundation project, consists of two essential components: “Hadoop MapReduce and the Hadoop Distributed File System. MapReduce is the parallel-processing engine that allows Hadoop to churn through large data sets in relatively short order. HDFS is the distributed file system that lets Hadoop scale across commodity servers and, importantly, store data on the compute nodes in order to boost performance (and potentially save money).”

Harris then discusses and details the other Apache projects that are related to Hadoop, some of which are built on either MapReduce or HDFS. These include query languages and databases for Hadoop. He also points out that “many Hadoop distributions integrate with various data warehouses, databases and other data-management products, with the goal of moving data between Hadoop clusters and other environments so each might process or query data stored in the other.”

There is also Hadoop management software, making it easier to manage and troubleshoot a Hadoop cluster. And, there are products for developers to write Hadoop applications, and others for performing data analysis outside of the traditional MapReduce jobs.

The number of products surrounding and integrating with Hadoop will only continue to grow as the challenge of Big Data continues. (See Harris Hadoop article and related feedback at http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop/)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Neetu Shaw

As Perficient's Business Intelligence (BI) Company-Wide Practice leader, Neetu Shaw provides thought leadership in developing and implementing a common BI foundational framework for Perficient and our many BI/DW clients, including common services, methods, knowledge management and an integrated enablement plan for both sales and delivery. Neetu is a business-focused and solutions-driven information management professional with executive consulting experience. Her career has been dedicated to BI consulting, thought leadership and solution sales leadership with solid experience in all phases of program implementation from initial business visioning to ROI justification through execution.

More from this Author

Follow Us