Spark SQL edit  

Spark library for processing structured data, using either SQL statements or a DataFrame API. Supports querying and writing to local datasets (including JSON, Parquet, Avro, Orc and CSV) as well as external data sources (including Hive and JDBC), including the ability to query across data sources. Includes Catalyst, a cost based optimiser that turns high level operations into low level Spark DAGs for execution. Also includes a Hive compatible Thrift JDBC/ODBC server that's compatible with Beeline and the Hive JDBC and ODBC drivers, and a REPL CLI for interactive queries. Introduced in Spark 1.0 with a production release in Spark 1.3, with substantially improved SQL functionalities in Spark 2.0.

Technology Information

TypeSub-Project
Parent ProjectApache Spark
Last UpdatedAugust 2017

Blog Posts