Skip to main content

Development

Heron: Is it Apache Storm 2.0?

In terms of real-time messaging or event analytics on the distributed node, we think of Storm and Kafka, both famous Apache projects. Storm has been used in Twitter’s production for many years, so it proves to be powerful. On June 4, Twitter shared news that the company developed a new system called Heron to handle billions of events each day, which is fully API-compatible with Storm. Here, we preview the new features in Heron and what’s different with Storm:

Storm Components

When we use Storm and develop our custom codes to process messages, there are couple of terms we need to get familiar with:

Topology. Topology is top level group of functions or a graph of computation, which is something like how we create a package in the application.

Stream. A stream is an unbounded sequence of tuples. A typical stream is stream of tweets or stream of topics.

Spout. A spout is a source of streams. For example, a spout may read tuples off of a Kestrel queue and emit them as a stream.

Bolt. A bolt consumes any number of input streams, does some processing, and possibly emits new streams. Bolts can do anything from run functions, filter tuples, do streaming aggregations, do streaming joins, talk to databases, and more.

The Storm users may primarily do stream transformation work on “Spouts” and “Bolts.” Spouts and bolts have interfaces that you implement to run your application-specific logic.

01

Heron Architecture and Features

Heron not only extends Storm architecture but it is also compatible with Storm API. The main goals are to increase performance predictability, improve developer productivity and ease manageability. Here is the architecture diagram and description from Twitter’s blog:

“The overall architecture for Heron is shown here in the following figure. Users employ the Storm API to create and submit topologies to a scheduler. The scheduler runs each topology as a job consisting of several containers. One of the containers runs the topology master, responsible for managing the topology. The remaining containers each run a stream manager responsible for data routing, a metrics manager that collects and reports various metrics and a number of processes called Heron instances which run the user-defined spout/bolt code. These containers are allocated and scheduled by scheduler based on resource availability across the nodes in the cluster. The metadata for the topology, such as physical plan and execution details, are kept in Zookeeper.”

0203

The Heron system provides some enhanced features comparing with Storm:

  • Easy to deploy scheduler and supports Mesos, YARN, or a custom environment
  • Easy to debug
  • Compatibility with Storm but faster performance

It is NOT the Storm 2.0 version, as it has made significant changes in the core framework. However, now it’s not open-sourced to public. But the design ideas and lessons learned were shared to the Storm and other real-time processing communities. Let’s see more use cases and practices shared with Heron.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Kent Jiang

Currently I was working in Perficient China GDC located in Hangzhou as a Lead Technical Consultant. I have been with 8 years experience in IT industry across Java, CRM and BI technologies. My interested tech area includes business analytic s, project planning, MDM, quality assurance etc

More from this Author

Follow Us