Facebook has been one of the software pioneers when it comes to Big Data. They have an continuously increasing amount of data that constantly needs to be indexed and queried in a fast, efficient manner. Facebook originally developed Hive which is a distributed data warehouse that allows for querying and managing of large data sets. They released Hive as open source in 2009, and development has continued under Apache. As technology has progressed developers have wanted the ability to query massive stores and return faster results. While Hive did provide superior distributed data management, it was never intended to be an on demand data store that users would interact with. Its purpose was to facilitate large data stores and allow for queries of that store in a passive manner. Thus it sacrificed speed and the queries it performed from Map Reduce jobs were slow.
Enter Presto. Presto is a SQL engine developed by Facebook that has the ability to scale to a petrabyte of data! Similar to Hive, Presto retains the ability to store and manage extremely large data sets. Unlike Hive, Presto no longer is dependent on Map Reduce Jobs where results need to be written to disk and then returned back to the file system. Presto is able to compile the query and return results in memory. Eliminating the I/O significantly increases the performance and as a result can return queries up to ten times faster than Hive with Map Reduce alone. To top it off, since Presto is a SQL engine so it can perform many of the same functions SQL can using traditional SQL syntax.
If the benchmarks are accurate to reports, Presto represents a huge leap in on demand results for querying large data sets. Traditionally large companies running Map Reduce jobs have had to use thousands of hours of compute time distributed across thousands of nodes per 24 hour period to keep their sites responsive and the user engaged. Utilizing an engine than can perform significantly faster results brings SQL to the forefront of Big Data and possibly the go-to technology for Big Data in the future.