Apache Parquet edit
Data serialisation framework that supports a columnar storage format to enable efficient querying of data. Built using Apache Thrift, and supports complex nested data structures, using techniques from the Google Dremel paper. Consists of three sub-projects, the specification and Thrift definitions (Parquet Format), the Java and Hadoop libraries (Parquet MR) and the C++ implementation (Parquet CPP). Created as a joint effort between Twitter and Cloudera based on work started as part of Avro Trevni, with an initial v1.0 release in July 2013. Donated to the Apache Foundation in May 2014, graduating in April 2015. Has seen significant adoption in the Hadoop ecosystem. Technology Information
Other Names Parquet Vendors The Apache Software Foundation Type Commercial Open Source Last Updated December 2018 - v1.11 (Parquet MR), v2.3 (Parquet Format), v1.5 (Parquet C++) Related Technologies
Is packaged by Cloudera CDH Release History
version release date release links release comment 1.0 (Parquet C++) 2017-03-17 announcement 1.1 (Parquet C++) 2017-05-22 1.2 (Parquet C++) 2017-08-04 announcement 1.3 (Parquet C++) 2017-09-25 announcement 1.4 (Parquet C++) 2018-03-07 announcement 1.10 (Parquet MR) 2018-03-31 changelog 1.5 (Parquet C++) 2018-09-22 announcement 1.11 (Parquet MR) 2018-11-22 changelog Links
News
Blog Posts