Skip to main content

Data & Intelligence

Data Staging and Hadoop

Traditionally, in our information architectures we have a number of staging or intermediate data storage areas / systems.   These have taken different forms over the years, publish directories on source systems, staging areas in data warehouses, data vaults, or most commonly, data file hubs.   In general, these data file staging solutions have suffered from two limitations:

  1. Data Staging and HadoopBecause of costs storage, data retention was usually limited to a few months.
  2. Since these systems were intended to publish data for system integration purposes, end-users generally did not have access to staging data for analytics or data discovery
Data Intelligence - The Future of Big Data
The Future of Big Data

With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.

Get the Guide

Hadoop’s systems reduce the cost per terabyte of storage by two orders of magnitude. Data that consumed $100 worth of data storage now costs $1 to store on a Hadoop system.   This radical reduction cost, enables enterprises to replace sourcing hubs with data lakes based on Hadoop, where a data lake can now house years of data vs. only a few months.

Next, once in a Hadoop filesystem (HDFS) the data can be published, either directly to tools that consume HDFS data or to Hive (or other SQL like interface).   This enables end-users to leverage analytical, data discovery, and visualization tools to derive value from data within Hadoop.

The simple fact that these data lakes can now retain historical data and provide scalable access for analytics, also has a profound effect on the data warehouse. This effect on the data warehouse will be the subject of my next few blogs.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Bill Busch

Bill is a Director and Senior Data Strategist leading Perficient's Big Data Team. Over his 27 years of professional experience he has helped organizations transform their data management, analytics, and governance tools and practices. As a veteran in analytics, Big Data, data architecture and information governance, he advises executives and enterprise architects on the latest pragmatic information management strategies. He is keenly aware of how to advise and lead companies through developing data strategies, formulating actionable roadmaps, and delivering high-impact solutions. As one of Perficient’s prime thought leaders for Big Data, he provides the visionary direction for Perficient’s Big Data capability development and has led many of our clients largest Data and Cloud transformation programs. Bill is an active blogger and can be followed on Twitter @bigdata73.

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram