Oracle’s last Data Warehouse reference architecture was released in 2010 and since then the industry has seen a lot of changes in handling data. This is a 2 part series covering the changes to the data warehouse reference architecture incorporating Big Data needs.
As Oracle puts it
What has changed in the last few years is the emergence of “Big Data”, both as a means of managing the vast volumes of unstructured and semi-structured data stored but not exploited in many organizations, as well as the potential to tap into new sources of insight such as social-media web sites to gain a market edge.
The post covers the background on Information Management and look at the new demands on DW and BI solutions to exploit new information sources such as ( Social Media, Sensor, Logs, etc.. ) for better competitive advantage.
The earlier information management reference dealt with readily analysed ,easily analysed using standard BI tools. The earlier versions of the IM reference was defined based on the technical and commercial limitations in 2008-09. Now some/most of the limitations that existed earlier have disappeared with the advances in technologies such as Hadoop, NoSQL and Hardware improvement in Oracle Exadata. This gives rise to the flexibility in determining the IM solutions without any consequences on hardware capability or limitations.
Increase the scope of Information Management Reference Architecture
The reference architecture paper argues that many social media organizations are not focusing on accommodating the existing IM solution and that Big Data is not any different from other aspects of Information Management. There are common question are the same for Big Data as well.
How the new data or analysis scope can enhance your existing set of capabilities?
What additional opportunities for intervention or processes optimization does it present?
Information Management Reference Architecture
The difference between the old Reference and the new reference architecture is noticeable. The new version of the IM architecture includes Knowledge Discovery and improved analytics tools used by Data scientists.
Old Reference Architecture
New Reference Architecture
A brief information on the different layers in the architecture.
Extract from Oracle’s Information Management White Paper
Staging Data Layer. Abstracts the rate at which data is received onto the platform from the rate at which it is prepared and then made available to the general community. It facilitates a ‘right-time’ flow of information through the system.
Foundation Data Layer. Abstracts the atomic data from the business process. For relational technologies the data is represented in close to third normal form and in a business process neutral fashion to make it resilient to change over time. For non-relational data this layer contains the original pool of invariant data.
Access and Performance Layer. Facilitates access and navigation of the data, allowing for the current business view to be represented in the data. For relational technologies data may be logical or physically structured in simple relational, longitudinal, dimensional or OLAP forms. For nonrelational data this layer contains one or more pools of data, optimised for a specific analytical task or the output from an analytical process. e.g., In Hadoop it may contain the data resulting from a series of Map-Reduce jobs which will be consumed by a further analysis process.
Knowledge Discovery Layer. Facilitates the addition of new reporting areas through agile development approaches and data exploration (strongly and weakly typed data) through advanced analysis and Data Science tools (e.g. Data Mining).
BI Abstraction & Query Federation. Abstracts the logical business definition from the location of the data, presenting the logical view of the data to the consumers of BI. This abstraction facilitates Rapid Application Development (RAD), migration to the target architecture and the provision of a single reporting layer from multiple federated sources.
The next post in the series would discuss the implementation methodologies for Big Data in the context of Information management framework.