Almost every year I will be attending the Hangzhou Spark meetup, where we can meet the Spark and Hadoop fans and expert from the local community. I love it because it is an open meetup and nothing to do with the commercial. In this year there are great knowledges and experiences sharing on Flink as well.
We have known about the Spark core and Spark streaming, so what’s the Flink and what does Flink do? It is actually another Apache project which is dedicating on streaming framework. Apache Flink is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications. It is more and more widely used in the recent years.
The first topic as “Spark SQL: Past, Present and Future” is delivered by the Spark code committer and reviewer, who is from Databricks. Since 2009, Spark has evolved its core, SQL and mining pillars to several generations. So Why Spark SQL? The intention is to getting simple to use Spark. By SQL syntax, more people can easily use the Spark without diving into the Spark core and its related techniques. The Catalyst, as an extensible optimizer was introduced in the detail and current way to optimize. Volcano Iterator Model was also explained on its integration into Spark.
Choosing a Global Software Development Partner to Accelerate Your Digital Strategy
To be successful and outpace the competition, you need a software development partner that excels in exactly the type of digital projects you are now faced with accelerating, and in the most cost effective and optimized way possible.
Get the Guide
The second speaker is from a popular China ride-sharing company, providing transportation service. There are lots of real time use case and demand in their daily service. Hence a cluster consist of thousands of nodes has been built to provide streaming data and computing service. It employs HDFS, Spark streaming, Flink streaming and duid.io etc for different use. As on resource management, some business case requires the dedicate and exclusive resource and others just need the shared resource. The node-label and node manager recovery mechanism is being used to handle and separate different requirements at resource level.
Cloud technology and platform is evolving fast so many companies have integrated Flink streaming into their cloud platform and then provide external service for the public user.
Other topics talked about the Spark on HBase, and Custom streaming platform and code generation framework on Flink streaming. There are a lot of streaming use case in Hangzhou such as the E-Commerce product promotion, ride-sharing, bike-sharing or the food ordering where the Spark and Flink streaming was used and customized.
It was a great meetup for the technical guys as each speak and audience is open to share their knowledge and good practice or the lesson learnt!
Spark Streaming: http://spark.apache.org/streaming/
Flink Streaming: https://flink.apache.org/introduction.html#flink-and-other-frameworks