The data warehouse has been a part of the EIM vernacular for nearly 20 years. The vision of the single source of the truth and a single repository for reporting and analysis are two objectives that have resulted in a never-ending journey. The data warehouse never has had enough data and the quality required for a single version of the truth demands significant investment that only rare business cases could support. Further, the role of the analytical database has generally been difficult to achieve. Ad-hoc analysis on large sets of complex data has generally been a significant challenge for the traditional data warehouse. Historically, to address this, companies have implemented appliances, analytical data marts, or a varying set of database features and compromises (think bit mapped indexing, a variety of hardware and software caching techniques, indexed stored data to name a few). All with significant investment and usually adding significant overhead.
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
However, there is hope: Big Data, aka, Hadoop on commodity class hardware. This low(er) cost and highly scalable environment can help to solve a number of challenges. First, the storage of almost 100 times more data is possible for a similar hardware investment. Next, analytical processing can be offloaded from the data warehouses. The massively scalable hardware truly enables adhoc, big data analytics, a role in which traditional data warehouses have struggled to satisfy.
Lastly, if one moves the function of storing atomic level data and the analysis overhead from the data warehouse, do we still have the need for a data warehouse? We need to have something to handle the complexities around reporting, structured analysis (e.g. OLAP) and traditional BI. Clearly, the data warehouse as we know it, will change significantly. With traditional data warehouse roles of storing atomic data and serving the up analytics transitioning to Hadoop environments do data warehouses become optimized data and reporting marts? Time will tell, but one thing is clear. Hadoop does not augment data warehouses; data warehouses will now augment Hadoop.