Apache Arrow edit

In-memory data structure specification for building columnar based data systems. Provides a standard interchange format to allow sharing of data between processes on a node without the overhead of moving or transforming the data, permits O(1) random access and has the ability to represent both flat relational structures and complex hierarchical nested data. Data is organised using a columnar structure memory-layout making it cache efficient for analytical workloads (which typically group all data relevant to a column operation together) and allows execution engines to take advantage of modern CPU SIMD (Single Instruction Multiple Data) instructions which work on multiple data values simultaneously in a single CPU clock cycle. Supports Java, C, C++, JavaScript, Python, Go, Ruby and Rust. Seeded from the Apache Drill project and promoted directly to a top level Apache project in February 2016 followed by an initial 0.1 release in October 2016. Used in a range of other projects including Drill, Spark, Impala, Kudu, Pandas and others. Has not yet reached a v1.0 milestone, but is still under active development with a range of contributors from a number of other Apache and non-Apache data projects.

Technology Information

Other Names	Arrow
Vendors	The Apache Software Foundation
Type	Commercial Open Source
Last Updated	July 2019 - v0.14

Release History

version	release date	release links	release comment
0.8	2017-12-18	blog post; release notes
0.9	2018-03-21	blog post; release notes
0.10	2018-08-07	blog post; release notes
0.11	2018-10-09	blog post; release notes
0.12	2019-01-21	blog post; release notes
0.13	2019-04-02	blog post; release notes
0.14	2019-07-02	blog post

Apache Arrow edit

Technology Information

Release History

Links

News