Apache DataFu edit
A set of libraries for working with data in Hadoop. Consists of two sub-projects - DataFu Pig (a set of Pig User Defined Functions) and DataFu Hourglass (a framework for incremental processing using MapReduce). Originally created at LinkedIn, with the Pig UDFs being open sourced in January 2012 as DataFu, with a v1.0 release in September 2013. Split into sub-projects in October 2013 when LinkedIn open sourced DataFu Hourglass and added it to the project. Donated to the Apache Foundation in January 2014, graduating in February 2018. Last major release was v1.3 in November 2015, with a handful of bug fix releases but little development activity since then. Technology Information
Other Names DataFu Vendors The Apache Software Foundation Type Commercial Open Source Last Updated January 2019 - v1.5 Sub-projects
Apache DataFu > DataFu Hourglass A framework over MapReduce that supports the efficient generation of statistics of dated data by incrementally updating the previous days output. Supports both fixed length and fixed start point windows, and the generation of statistics by input partition or as a total over all input data. Apache DataFu > DataFu Pig A set of user defined functions for Apache Pig, including support for statistical calculations, bag and set operations, sessionisation of streams of data, cardinality estimation, sampling, hashing, PageRank and others. Related Technologies
Is packaged by Apache Bigtop, Hortonworks Data Platform Release History
version release date release links release comment 1.0 2013-09-04 summary 1.3 2015-11-18 summary First Apache (Incubating) release 1.4 2018-03-25 summary Release to mark Apache graduation; includes 1.3.x patches 1.5 2019-01-07 summary Java 8 compatibility; two new macros Links
News
Blog Posts