Key Differences between a Traditional Data Warehouse and Big Data - Perficient Blogs
Blog
  • Topics
  • Industries
  • Partners

Explore

Topics

Industries

Partners

Key Differences between a Traditional Data Warehouse and Big Data

Traditional data warehouse solutions were originally developed out of necessity. In order to run the business, every company uses enterprise resource planning (ERP) and CRM applications to manage back-office functions like finance, accounts payable, accounts receivable, general ledger, and supply chain, as well as front-office functions like sales, service, and call center. The data captured from these traditional data sources is stored in relational databases comprised of tables with rows and columns and is known as structured data. These databases are optimized for online transaction processing (OLTP) and are not easily queried for ad-hoc reporting and analysis.

So how do you make the data gathered more useful? Microsoft Excel! While Excel can be a useful tool, there are limitations and problems with the freshness, consistency, and integrity in using Excel to perform analysis. That’s where business intelligence comes into play. Gartner defines business intelligence as “an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.”[1]

The traditional approach to providing business intelligence on the data collected from business applications involves extracting the data from the transactional systems and moving it into a data warehouse which is optimized for reporting, not transaction processing. This process begins with data consolidation tools like Informatica or Oracle Data Integrator. These tools extract the data from the relational database or source system, transform it into a useable format for querying and analysis, and then load it into a final target database such as an operational data store, data mart, or data warehouse. These tools, commonly referred to as ETL (Extract, Transform and Load) tools, allow organizations to move and transform the data to build very complex enterprise data warehouse platforms.

Once the data is in the data warehouse, data rendering tools, with prebuilt dashboards and reports for users to access, pull data to provide insights into business performance for true data-driven decisions. Some reporting tools allow power users to build their own ad-hoc reports as well as various visualizations.

While a tabular report can prove useful for a sophisticated user who wants to review all the detail, less detail-oriented users may benefit from a presentation of the data in a more visually stimulating manner that contrasts the data using sizes, shapes, colors, and position to indicate relative values and potentially, make the data more meaningful.

Although both representations of traditional data warehouse content are information rich, neither version addresses the changing variety of data that organizations are accumulating to support their eCommerce or social platforms. While the path to building a data warehouse for the structured data coming out of source systems such as ERP and CRM is clear, organizations must look at other technologies to be able to provide business intelligence on the data that is not stored on relational table sources.

What is Big Data?

Big data is refers to the modern architecture and approach to building a business analytics solution designed to address today’s different data sources and data management challenges. With the exponential rate of growth in data volume and data types, traditional data warehouse architecture cannot solve today’s business analytics problems. With big data architecture, you can perform business analytics on large volumes of data stored in different applications whether in structured or relational tables or unstructured and files. The most important and complex part of a big data initiative is deciding what business problems you can solve today which can help your organization to increase revenue or reduce costs and inefficiencies.

Multi-Structured Data

Taking a step away from traditional, transactional data sources, you will find multi-structured data sources. A common example of a multi-structured data source is online commerce. The sheer volume of data created by customers through online interactions is staggering. Think of eBay and your shopping behavior. Those personal recommendations that eBay displays for you are directly related to your search and purchase history on its site. Think about Priceline and your search pattern for a trip. Priceline makes recommendations based on your viewing history. Your online search behavior is being watched and tracked and is extremely valuable to retailers. All of this information is stored in a web log and could also include a combination of images and video logs. These multi-structured data types require a different approach to storage, cleansing, and analysis.

Unstructured Data

While commerce is a great example of multi-structured data and its inherent challenges, unstructured data fits even less into the traditional BI data warehouse model.

A prime example is the data resulting from our interactions on social media, like Twitter and Facebook. Comments, likes, and trending hashtags are all different forms of unstructured data that are growing every day. When you add to this machine and sensor data, log files created by servers, and other data points captured by the Internet of Things (IoT), the scope of unstructured data available to analyze is mind boggling. These types of data are not stored in traditional databases. In fact, they are different file types altogether.

Data stored in the web, weather data, research data, and consumer data created by market research firms like Nielsen and IRI are all examples of unstructured data. Combining these data sets together can be a very powerful tool to perform predictive analytics.

The variety and volume of data that the C-suite is challenged to manage calls for a different approach to store, cleanse, and process the data. The end goal of performing real-time analytics for data-driven decisions demands a new way of thinking. Big data is the modern approach to store petabyte, exabyte and – very soon – zettabytes of data.

If your unstructured data is growing exponentially, you need big data platforms to support your organization’s analytics need.

[1] Gartner

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to the Weekly Blog Digest:

Sign Up