Perficient's Big Data Stack / Blogs / Perficient

Big Data has generated a lot of interest in the media and in industry, leading to the possible impression that every data problem is a “Big Data” problem. However, the amount of interest is justified given the performance and scalability boost possible and the economic feasibility of Big Data platforms enabled by commodity hardware clusters/grids and open source Big Data databases, processing platforms, and technologies (e.g., Hadoop, Cassandra, HBase, MongoDB – to name a few).

To help you have a better understanding of what is involved in Big Data and provide a framework for your Big Data initiatives, I’d like to present Perficient’s Big Data Stack. I hope this will help you to peel back the layers upon layers of complexity. Big Data is very powerful – but not necessarily easy, and it represents a significant paradigm shift from the traditional relational database (if leveraging a NoSQL database for your Big Data platform).

The stack diagram below is divided horizontally into two categories – Technology and Roles/Organization. I will talk more on the technology component later. A key role in Big Data is that of “Data Scientist.” These are people with statistics, data modeling, data mining, and programming experience. Data Scientists look for gold in massive amounts of data, and can present their findings in a comprehensible manner to management and others. Governance needs to be involved in Big Data – both Data and Application Governance, and there are other roles involved as well such as data and system architects, developers, and system administrators.

Good UX Means Good Business

In a world where technology is rapidly advancing and user expectations are rising, it’s no longer enough to have an average user experience; to delight your users and surpass your competition you must strive for the exceptional.

Get the Guide

Figure 1 – Perficient’s Big Data Stack

The Big Data Stack is also divided vertically between Application and Infrastructure, as there is a significant infrastructure component to Big Data platforms, and of course the importance of identifying, developing, and sustaining applications which are good candidates for a Big Data solution is important.

Below I will provide a high level overview of each of the technical stack components:

Category	Component	Description
Application	Data Sourcing	Most Big Data applications will require data that is sourced from other databases and interfaces, and so this is the first core component in the stack
Application / Processing Type	Analytics	Advanced analytics is a common application which may indicate need for a Big Data solution.
Application / Processing Type	Operations	Big Data can support operational needs as well, e.g., real-time Complex Event Processing for patient monitoring, risk management, etc.
Application	Distributed Processing	Distributed processing is at the core of Big Data processing where you execute a task on many computers in a cluster/grid.
Infrastructure	Representation	There are many ways that data can be represented in a Big Data platform, e.g., wide column stores, key value pairs, graph, relational, etc.
Infrastructure	Persistence	Indicates how data will be persisted – or if it will be in the Big Data platform. When processing massive streams of real-time data, you might not need to persist all the data – just grab and process what you need. You can use a NoSQL, Distributed Filesystem, or MPP RDBMS’s for Big Data persistence.
Infrastructure	Platform	You can use open-source or proprietary software & databases, commodity or proprietary hardware. Leveraging a public / private cloud is an option as well.
Management	Security	Security of course is important, especially in healthcare setting. If you are going to put PHI data in a Big Data platform, you will want to look at low-level encryption capabilities that perform encryption at the IO level (as data is be written to/read from disk). Security is not as mature and robust in NoSQL platforms.
Management	Development and Management	There are a wide array of open source and proprietary development and management technologies which will be part of the Big Data equation, e.g., schedulers, load balancers, etc.

I will delve into more detail into these components of Perficient’s Big Data stack in future articles. I’d be interested to hear your feedback on the Big Data Stack and how this compares with your experience. You can reach me at pete.stiglich@perficient.com or in the comments section below.

Perficient’s Big Data Stack

by Pete Stiglich on November 13th, 2012 | ~ minute read

Good UX Means Good Business

Tags

Leave a Reply

Pete Stiglich

Categories

Follow Us