Data Lakes have been around since the early part of this decade as most Fortune 500 companies have a Data Lake or are building a Data Lake. The drive to lake data has predominately been driven by analytical use cases where Data Scientists can wrangle and prepare data for their study or model building.
However, Perficient is seeing a significant shift from just deploying Hadoop to support analytics use case to using Data Lakes for operational data processing use cases. Companies are now able to move processing from expensive legacy mainframes and MPP data warehouses to Hadoop and Cloud-based systems. Although operational data processing has always been possible on Hadoop systems, the momentum has significantly accelerated due to a number of advances in in past few years.
These advances include:
- SQL-based transformation included in Spark has made Big Data ETL accessible to many firms not wishing to invest in expensive ETL tools,
- Cloud-based Data Warehouses that can offer similar scale and ease of use as traditional EDW systems at a fraction of the cost, and
- Security advancements of Hadoop and Cloud-based Big Data Offerings has reassured companies that their data assets are protected in the new data ecosystem.
The movement to perform more operational data processing on Data Lakes brings its own set of challenges that companies need to address. In my next blog post, I will investigate these challenges that companies are facing as Big Data becomes operational.