StreamSets Data Collector edit  

General purpose technology for the movement of data between systems, including the ingestion of batch and streaming data into an analytical platform. Pipelines are configured in a graphical user interface, and consist of a single origin, one or more processor stages and then one or more destinations, with support for a wide range of source/destination technologies and processor transformations. Supports a wide range of data formats, executors (tasks that can be triggered based on events from pipelines, e.g. to send e-mails or run a shell script), handling of erroroneous records, support for CDC CRUD records, previewing of data within the editor UI, real-time reporting and alerting on a range of execution and data quality metrics, the ability to dynamically handle changes to schemas and the semantic meaning of data and a full Python SDK. Can run in standalone mode (as a single process, with the option to run single or multi-threaded), as a Spark Straming or MapReduce job on a cluster, or in an ultralight agent (StreamSets Data Collector Edge). Java based, Open Source under the Apache 2.0 licence, hosted on GitHub, with development led by StreamSets who also provide commercial support and a number of commercial add-ons, including Control Hub (cloud service for developing and managing pipelines), Dataflow Performance Manager (for managing data metrics) and Data Protector (for managing senstive data). Started in October 2014, with a v1.0 release in September 2015.

Technology Information

VendorsStreamSets
TypeCommercial Open Source
Last UpdatedAugust 2019 - v3.10

Release History

versionrelease daterelease linksrelease comment
3.02017-12-15See 3.0 notes on documentation and release page; blog post 
3.12017-03-30See 3.1 notes on documentation and release page 
3.22018-05-11See 3.2 notes on documentation and release page 
3.32018-05-24See 3.3 notes on documentation and release page 
3.42018-08-10See 3.4 notes on documentation and release page; blog post 
3.52018-10-01See 3.5 notes on documentation and release page; blog post 
3.62018-11-26See 3.6 notes on documentation and release page 
3.72019-01-08See 3.7 notes on documentation and release page 
3.82019-03-14See 3.8 notes on documentation and release page 
3.92019-06-06See 3.9 notes on documentation and release pageblog post
3.102019-08-01See 3.10 notes on documentation and release pageblog post

News