Internet produces 2.5 Exabytes of data every day. Data management systems will accumulate 44 Zettabytes of data by the end of 2020. To put things in perspective, 1 Zettabyte is equivalent to 1 trillion Gigabytes. That is a lot of data and most of if contains relevant information. Automated workflows, social media, government agencies and IOTs have contributed significantly to this accumulation. As big data continues to grow so does its importance and relevance. Structured or unstructured, this data is paramount to existing operations. More so, it has been a vital source of subjective and predictive analysis. Therefore, organizations that generate and govern these data streams are interested in sharing it with a value adding consumer.
Streaming APIs are a great way to share this data with both external and internal consumers. Moreover, leveraging cloud based managed services or cloud IAAS helps with scaling and global outreach. However, there are few key challenges that organizations often face when sharing data streams. To name a few:
- Security (inflight and at rest)
- Usage
- Latency
- Global outreach
- Transfer protocols (focused on realtime processing)
- Availability, and
- Cost
While API gateways can address many of the above challenges, to build APIs that can secure, monitor and stream data simultaneously is challenging. Listed below are few streaming API designs that one can evaluate for such use cases.
Data Streaming Networks (DSN) – DSN are fully managed streaming services that stream real time data globally. They have built-in networking intelligence and to some extent cloud IAAS redundancy to securely transfer streaming data to the global audience. PubNub and Pusher are two large DSNs that stream trillions of messages a year with minimal latency. Moreover, they support serverless edge computing, which filters and processes data right before it is consumed. Costs to use DSN could run high for large streaming volumes and can pose economic challenge for smaller organizations with large streaming requirements.
API Gateway with websocket server – API management tools like CA API Gateway support websocket connections. It proxies a REST API call to a backend websocket server. Websocket server can tbe configured to connect to a streaming source. The API gateway secures the call, monitors the data and handles the websocket protocol. It could be deployed both on promise and on cloud. Solution team is responsible for building the infrastructure and deploying the gateway. For global outreach with minimal latency, deploy the gateway on cloud.
Cloud based streaming services – Cloud based streaming services like AWS Kinesis, Kinesis Data Firehose and Azure Streaming Analytics is another way to stream data globally. These services are fully managed services that have service specific configurations to retrieve data from the streaming source. While these services are region specific, they can be configured to distribute streaming data globally, which reduces latency for region specific consumers. Moreover, these services can push data straight to a managed big data service like RedShift or Azure Data Warehouse to perform runtime analytics.
Custom broker solutions – A cheaper but more controlled way of building streaming API solution is to use open source broker solutions. These are time tested solutions with a strong open source community support. Stable releases could be deployed on cloud and can be configured behind an API gateway (preferably cloud ) for security and monitoring purposes. Mosquitto and RabbitMQ are two such broker solutions. While Mosquitto supports only MQTT (Message Queue Transport Telemetry), RabbitMQ supports MQTT, Websockets and AMQP (Advanced Message Queuing Protocol) 1.0 and 0.x. The brokers can be deployed in a scalable environment with producer libraries acting as a bridge between them and the streaming source. They can be put behind an API Gateway to address networking and access challenges. Another simpler way is to use the broker’s API endpoints directly alongside its inbuilt security and monitoring features.
IOT will continue to generate more data and AI and Analytics Engines will continue to consume it in ways that are yet to be uncovered. Between generation and consumption lies an opportunity to control, standardize and enrich this data. If done right, managing the data flow process can generate value for both the data producer and the data consumer. Above is a brief on some streaming API design patterns and services that you can evaluate for governing your data streams. For more detailed explanation and help with building streaming API solutions please reach out to one of our sales representatives at sales@perficient.com