Skip to main content

Development

Will EDW Vanish Because of Hadoop?

Nowadays when people talk about IT technology and trend, there was overwhelming world of big data as we entering the new technology era. I know that big data has been promoted to national strategy in many countries. A few days ago I found an interesting debate whether the traditional data warehouse will be replaced by Big data appliance especially Hadoop.

The person who supports the viewpoint has statement titled with “The EDW Is A Relic”. The major points of argument are that the traditional EDW is under expectation in flexibility and scalability. One of the outstanding advantage of Hadoop is “The brilliance of what Hadoop does differently is that it doesn’t ask for any of these decisions up front. You can land raw data, in any format and at any size, in Hadoop with virtually no friction.” And he has had very optimistic predication “There is no longer a need for a traditional data warehouse. It is an inflexible, expensive relic of a bygone age. It is time to leave the dark ages”. In his conclusion the EDW will be gone, just like the old life die, and new life born.

Another guy coming from a famous traditional EDW vendor believes that Hadoop cannot take over the role of traditional EDW has played for decades. He think in today’s design and best practice, people can avoid the design issue as “Rigid structures are not an inherent problem in today’s best data warehouse architectures that are designed for analytics.”; “Can Hadoop provide and implement the same functionality in a couple of years? The answer is obviously, no, and it would be a real shame to waste the community’s efforts to rebuild existing functionality vs. inventing newer and more extraordinary use cases.

Even the argument is a bit swordplay, each side of the debate has actually agree with part of evidence in other side. Hadoop is a new technology and it was widely recognized as valuable solution which will bring up tremendous values to enterprise. However, it is not mature and still need great effort to implement it as expected. I personally prefer the latter person who holds opinion that EDW won’t vanish in decades. The reason to support my viewpoint is:

In current IT marketplace, many enterprises have invested a lot in traditional EDW and are increasing more investment. In terms of ROI, they don’t expect to gain large benefit in few years. Maybe 5 years later or longer, the data and analysis will create value for company. Changing EDW to new architecture based on emerging technology means risk, more investment and uncertainty.

EDW is also being developed and evolved continually. We know that DW is born and famous with Bill Inmon’s style decades ago. And nowadays it is being innovated for more styles such as Kimball, data vault’s model which I will introduce in future posts. All of these indicate that EDW won’t die but will be more practiced and adopted in system engineering.

I dislike saying that EDW defect Hadoop or vice versa. I do like the combination of each model and it will create new power in data integration & analysis area. Hadoop has the advantage in storage, distributed computing and flexibility for remodeling in nature but it can’t completely function well for complex business model, therefore we still need a data mart or EDW to contain dimension, fact and bridge.

Following is the architecture from Chris Harris who is working in Hortonworks. Hadoop core components (here is Hortonworks HDP) can act as key part in the system to perform extracting and loading from structure/unstructured source.

untitled 

While Data mart is more meaningful to business and satisfy the needs of dash boarding, business analytics, it will need to pull dataset from Hadoop storage (HDFS) with dynamical transformation. With appropriate configuration and node clustering, Hadoop MapReduce engine can take effect to transform, aggregate raw structured data, converting unstructured data source like picture, web, and social media to the form which can be handled by end apps.

Hadoop and related technology is growing fast but it is still in the phase of baby to young people, and EDW is not old people. They can be integrated together, to bring values, reduce cost for customers.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Kent Jiang

Currently I was working in Perficient China GDC located in Hangzhou as a Lead Technical Consultant. I have been with 8 years experience in IT industry across Java, CRM and BI technologies. My interested tech area includes business analytic s, project planning, MDM, quality assurance etc

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram