Streaming Data Stores edit
Our list of and information on commercial, open source and cloud based streaming data stores, including Kafka, Confluent, MapR-ES and alternatives to these. Technologies for the persistent storage of continuous streams of data, with data access based on a publish/subscribe model. Should support multiple independent publishers and subscribers, the ability to add new subscribers and replay the history of a stream, horizontal scalability and load balancing, durable writes, ordered streams (data is always read in the order it was written), high throughput and low latency characteristics, handling of updates and deletes to source records, and the ability to secure the data. The following are open source Streaming Data Store technologies: Note that Apache Kafka is bundled with a number of Hadoop distributions. The following are commercial Streaming Data Store technologies: The following are Streaming Data Store technologies available as a managed service in the cloud:Category Definition
Open Source Technologies
Apache Kafka Technology for buffering and storing real-time streams of data between publishers to subscribers, with a focus on high throughput at low latency. Confluent Open Source A package of open source projects built around Apache Kafka with the addition of the Confluent Schema Registry, Kafka REST Proxy, a number of connectors for Kafka Connect and a number of Kafka clients (language SDKs). Pravega Technology for the buffering and long term storage of streaming data, designed for low latency and high throughput, with support for exactly once semantics, durable writes, strict ordering, dynamic scaling, transactions and long term storage backed by HDFS. Apache BookKeeper Distributed log storage service from Yahoo - http://bookkeeper.apache.org/ Apache DistributedLog Distributed log service from Twitter supporting durability, replication and strong consistency built over Apache BookKeeper - http://bookkeeper.apache.org/distributedlog/ Apache Pulsar Distributed pub-sub messaging from Yahoo, with persistent message storage based on Apache BookKeeper - http://pulsar.incubator.apache.org/ LogDevice Open source distributed data store for sequential data from Facebook - https://logdevice.io/ Commercial Technologies
Confluent Enterprise A commercial version of the Confluent Open Source product, with the addition of a number of commercial closed source products including a JMS client, Control Centre (for managing Kafka clusters), Multi DC Replication (active-active replication between Kafka clusters) and Auto Data Balancing. MapR-ES Part of the MapR Converged Data Platform - supports streaming data storage capabilities and a Kafka compatible API AMQ Streams Kafka distrubtion from RedHat that runs on OpenShift - https://access.redhat.com/products/red-hat-amq-streams Technologies Available as a Service
Confluent Cloud Confluent Enterprise as a service - https://www.confluent.io/confluent-cloud/ Amazon Kinesis Streams Streaming data storage and publish service - https://aws.amazon.com/kinesis/streams/ Amazon Managed Streaming for Kafka (MSK) (public preview) Fully managed, highly available, and secure Apache Kafka service - https://aws.amazon.com/msk/ Azure Event Hubs Elastic service for the buffering and publishing of streaming event data with a Kafka compatible end point - https://azure.microsoft.com/en-us/services/event-hubs/ Google Cloud Pub/Sub Real time message and streaming data service with “at least once” delivery - https://cloud.google.com/pubsub/ Blog Posts