Apache Crunch edit  

An abstraction layer over MapReduce (and now Spark) that provides a high level Java API for creating data transformation pipelines, originally designed to make working with MapReduce easier based on the Google FlumeJava paper. Also includes connectors for HBase, Hive and Kafka, Java 8 lambda support, an experimental Scala wrapper for the API (Scrunch), and support for in memory pipelines and helper classes to support testing. Open sourced by Cloudera in October 2011, donated to the Apache Foundation in May 2012, before graduating in February 2013. Support for Spark was added as part of v0.10 in June 2014. Still being maintained, and appears to have had been adopted at a number of large companies, but with limited new development.

Technology Information

Other NamesCrunch
VendorsThe Apache Software Foundation
TypeCommercial Open Source
Last UpdatedApril 2017 - 0.15

Related Technologies

Is packaged byApache Bigtop
Is packaged by (but deprecated)Cloudera CDH

Release History

versionrelease daterelease linksrelease comment
0.152017-02-26GitHub release page 

News

Blog Posts