Structured Streaming edit

Extension to the Spark SQL DataFrame API to allow Spark SQL queries to be executed over streams of data, with the engine continuously updating and maintaining the result as new data arrives. Uses the full Spark SQL engine (including the Catalyst optimiser), and supports end-to-end exactly-once semantics via checkpointing when sources have sequential offsets. Supports aggregations over sliding event-time windows, including support for late data and watermarking. Introduced in Spark 2.0 with a production release in Spark 2.2.

Technology Information

Type	Sub-Project
Parent Project	Apache Spark
Last Updated	August 2017

Blog Posts

The Week That Was - 25/08/2017 2017-08-25 Kylin Beam REX-Ray Zenko Structured Streaming Peter

Technology Information

Links

Blog Posts