This post is a continuation to my previous post on changes to the Information Architecture in recent times with the advances in Big Data management.
This post discusses the 2 common approaches to implementing Big Data in organizations which accommodate Big Data in existing Information Management framework. The two approaches are Knowledge pooling and Knowledge stripping. There are pros and cons to both approaches which needs to be determined based on the companies information needs and the risk they are willing to take.
Knowledge Stripping
This is a more conservative approach for getting quick return on the business investments. This is also the approach that companies take who are not convinced with the arguments on the value of Big Data until they see the returns. The approach suggests identifying data sources for the specific business problems being addressed and then loading them into the discovery sandbox where more analysis and manipulation can be performed.
Knowledge Pooling
This is a more radical approach to integrating Big Data to the existing IM framework. This approach is sometimes referred as “Built it and they will come”, since you build the hadoop cluster and populate it with all the data that you have. Most of the business problems can be addressed by the cluster as the data needed might already be there. The rest of the tasks of analyzing the data, building a model of some type and then deploying the knowledge to inbound channels as appropriate are pretty much the same as the Knowledge Stripping method, but there are some differences in subsequent deployment steps.
Though the data has been deployed in different technologies ( Hadoop) we need to consider the pool of data to be part of the Foundation Layer of the DW, since logically they complement the strong typed data with the weakly typed data.
This is followed by adding any new data that was used by the analysis either to the relation store if the data is strongly typed or to the Hadoop cluster if the data is weakly typed. The data then is optimized for production environment and feed to the Warehouse through standard ETL. And Finally the data becomes a part of the Access and Performance Layer which can be used for reporting as usual.
Conclusions
The new reference architecture for Information Management is a very good reference for future Warehouse implementations and Big Data needs, as it truly defines the evolution of DW systems and provides a way to avoid failures in DW and BigData implementations.