Apache CarbonData edit

Unified storage solution for Hadoop based on an indexed columnar data format, focusing on providing efficient processing and querying capabilities for disparate data access patterns. Data is loaded in batch, encoded, indexed using multiple strategies, compressed and written to HDFS using a columnar file format. Provides a number of highly configurable indexes (multi-dimensional key, min/max index, and inverted index), global dictionary encoding and column grouping to support interactive style OLAP queries, high throughput scan queries, low latency point queries and individual record queries. Also supports batch updates and deletes using delta bitmap files and compaction. Written in Java using Apache Thrift, supports all common primitive data types and complex nested data types including array and structures. Consists of several modules, the format specification and core implementation (columnar storage, indexing, compression, encoding), Hadoop input/output format interface, deep integration with Spark, interfacing to Spark SQL and the DataFrame API and connectors for Hive and Presto. Started back in 2013 at Huawei's India R&D center, donated to the Apache Foundation in 2015, graduated in April 2017, with a stable (1.1.0) release in May 2017, and under active development.

Technology Information

Other Names	CarbonData
Vendors	The Apache Software Foundation
Type	Commercial Open Source
Last Updated	September 2019 - 1.6

Release History

version	release date	release links	release comment
1.3	2017-02-03	release notes
1.4	2018-06-04	release notes
1.5	2018-10-23	release notes
1.6	2019-08-19	release notes

News

https://blogs.apache.org/carbondata - CarbonData blog
https://cwiki.apache.org/confluence/display/CARBONDATA/Releases - details of releases

Technology Information

Release History

Links

News