Apache Parquet edit  

Data serialisation framework that supports a columnar storage format to enable efficient querying of data. Built using Apache Thrift, and supports complex nested data structures, using techniques from the Google Dremel paper. Consists of three sub-projects, the specification and Thrift definitions (Parquet Format), the Java and Hadoop libraries (Parquet MR) and the C++ implementation (Parquet CPP). Created as a joint effort between Twitter and Cloudera based on work started as part of Avro Trevni, with an initial v1.0 release in July 2013. Donated to the Apache Foundation in May 2014, graduating in April 2015. Has seen significant adoption in the Hadoop ecosystem.

Technology Information

Other NamesParquet
VendorsThe Apache Software Foundation
TypeCommercial Open Source
Last UpdatedDecember 2018 - v1.11 (Parquet MR), v2.3 (Parquet Format), v1.5 (Parquet C++)

Related Technologies

Is packaged byCloudera CDH

Release History

versionrelease daterelease linksrelease comment
1.0 (Parquet C++)2017-03-17announcement 
1.1 (Parquet C++)2017-05-22  
1.2 (Parquet C++)2017-08-04announcement 
1.3 (Parquet C++)2017-09-25announcement 
1.4 (Parquet C++)2018-03-07announcement 
1.10 (Parquet MR)2018-03-31changelog 
1.5 (Parquet C++)2018-09-22announcement 
1.11 (Parquet MR)2018-11-22changelog 

News

Blog Posts