Yarn….. Yes, Hadoop may be changing everything, but when Yarn was released, the change pedal has been pushed aggressively to the floor. Putting the technical details aside, the bottom-line is that now multiple concurrent workloads can be executed and managed on Hadoop clusters. This “pluggable” service layer has separated the data processing and cluster resource management layer. Result is that we are not dependent on MapReduce to access and process HDFS data.
Most companies with products accessing HDFS data are doing it without MapReduce. Oracle, SAS, IBM and many niche providers run their own software components on the data nodes. This will change the dynamics of how we construct clusters. More memory and more CPU will be required to support these additional processing requirements. It is too early to tell if we should beef up our nodes or add more nodes. Short of running your own POC and tests, keep an eye on the “all-in-one” appliance vendors as they bring out their new appliances in the year. How they move will be a good indicator.
Does any vendor have a “silver bullet”? Until these solutions get into production and mature, there will be challenges. However, they still will provide exceptional value creation – even with any associated headaches. Do not shy away. Do your due diligence and choose tools that leverage your current capabilities. Move forward, Big Data is here to stay and you need to move forward or be left behind. The accelerator has been pushed. Are you stuck in neutral or are you in the race to develop a competitive advantage from Big Data?
If you want to learn how to quickly gain value from your Big Data; contact Perficient!