This is part 3 of the Big Data Trends i have been covering the past couple of weeks. I covered the changes to the traditional ETL solutions that are increasingly being used in enterprises ( stream processing ). This part covers yet another trend in Big Data. Open source statistical programming and Crowd sourced data science.
There are 2 approach that businesses usually take to gain insights into their business. The first one is the most prevalent method. i.e having a business case and going after and digging through the data and organizing the data for better decision making. The other more ignored method would be digging and mining into the volumes and mountains of information in an company and creating a useful business case a.k.a Data mining.
Enterprise face huge battles because they have to sort, process and analyse high volumes of data and the current breed to tools don’t address all the needs. This is where R does its magic.
R ( Open Source Statistical Tool )
Beyond the realms of statistical programming, “R” incorporates data mining and predictive analysis programming features. R is an open source statistical programming tool which has already been integrated with many BI tools. R is becoming the new standard for statisticians and data mining experts.
The technologies like Hadoop, Drill and Stream processing are the back end technologies that help us store large web scale data and process these information in an effective manner. But the more creative process kicks in when you dive into the data and find useful information and insights for the business. The real value in having volumes of data is using it to understand and be able to predict the future. This is where tools like “R” and “Weka” come into play.
R integrates well with Hadoop and is the tool you need to unlock the hidden value underneath Big Data. And adding all these strengths, you have publicly available analysis which might be very similar to what any organization needs.
Revolutionary for R ( Tool to use R more effectively )
The software is an abstraction layer on top of R. Revolutionary Analytics helps in
- BigData Analysis
- High Performance Computing
- and Analytics in production
Not always does a company have the resources to perform extensive data mining. But if they have a problem at hand and they want a third eye to look at the data to find a solution to their business problem, Kaggle kicks in solving data mining and predictive analysis solution. Kaggle is an open data science competition.However instead of a third eye, the whole world attacks the problems in teams and tries to solve and get the most out of your data.
As an example,
- Kagglers helped Netflix make a better recommendation engine.
- Kagglers are helping Nasa identifying new earth like planets and black holes based on the huge volumes of pictures Nasa has published online.
Solutions such as this should not be overlooked because,
- You get access to wealth of expertise for a small prize money
- You get the best solution possible on a problem.
- And finally the benefits outweigh the trouble of opening your company data.
R, Weka and Kaggle are one of the BigData trends which help business analyse and gain insights into information companies have collected and managed using BigData technologies.
My next and last post on BigData trends would cover In-Memory analytics and its impact on BI and other consumer technologies.