Structured Streaming edit  

Extension to the Spark SQL DataFrame API to allow Spark SQL queries to be executed over streams of data, with the engine continuously updating and maintaining the result as new data arrives. Uses the full Spark SQL engine (including the Catalyst optimiser), and supports end-to-end exactly-once semantics via checkpointing when sources have sequential offsets. Supports aggregations over sliding event-time windows, including support for late data and watermarking. Introduced in Spark 2.0 with a production release in Spark 2.2.

Technology Information

TypeSub-Project
Parent ProjectApache Spark
Last UpdatedAugust 2017

Blog Posts