In the Big Data world Lambda architecture created by Nathan Marz is a standard technique applied to solve many predictive analytics problems. This architecture effectively delivers the streaming data and batch data to combine the past information with the current changes, producing a comprehensive platform for predictive framework.
Lambda Architecture
On a very high generic level the architecture has 3 components.
- Batch Layer, which has all the processed batch data from the past.
- Speed Layer or real-time feed of similar or same information.
- Servicing layer which holds the batch views relevant for the queries needed by the predictive analytics
Lambda architecture solves the issue of intended output can change because of code changes. In other words enhancement in code for better data processing is achieved by keeping the original input data intact or read only. Though some may claim that Lambda architecture is an exception to CAP theorem is debatable.
In reality, programming for batch and the stream typically needs two different set of codes. This is an issue because business logic and other enhancements has to be done in two different places. Creating a single API for both batch and real-time data can be one way to hide the complexity for the higher level code but the fact remains there are two different branches for processing at the lower level.
Extended lambda Architecture
Assuming you are satisfied with the limitations of Lambda architecture, most predictive analytics needs past data along with the data captured within the enterprise. Including those key data will enhance the overall quality and provide the most available data for the predictive engine.
As the industry matures, these techniques will become more robust and will provide the best available data faster than ever. As we now take star schemas and their variations as a given for Data Warehousing, Lambda architecture and their variations will be prevalent in the near future as well.