Hadoop Distributions edit  

Our list of and information on commercial, open source and cloud based Hadoop distributions, including Cloudera, Hortonworks, MapR, Amazon EMR, Azure HDInsight, Google Cloud Dataproc and alternatives to these.

Category Definition

Products or services built around Hadoop (or an Hadoop compatible core) combined with a number of Hadoop compatible products. Hadoop compatibility covers the use of YARN (for resource management of multiple jobs running on the same infrastructure) and HDFS (for local storage of data with support for co-locating processing with the data).

Further Information

See also our Hadoop (HDFS and YARN) ecosystem diagrams

We also have a summary of the ODPi organisation that’s trying to drive compatibility between Hadoop distributions.

Merv Adrian from Gartner maintains a tracker of the different versions of each Hadoop component in the major distributions - https://blogs.gartner.com/merv-adrian/2018/01/03/january-2018-hadoop-tracker/

The following analyst material covers a number of technologies in this category:

Commercial Distributions

The following are distributions from commercial vendors for installation on pre-provisioned infrastructure, with many also including tooling for programmatically provisioning infrastructure when installing in cloud environments.

Cloudera CDHA distribution of Hadoop based on the addition of a number of closed source products, including Cloudera Manager (for installing and managing clusters), Cloudera Director (for installing in cloud environments) and Cloudera Navigator (for managing metadata and the encryption of data). Available in free and commercial editions.
Hortonworks Data PlatformA distribution of Hadoop based on a commitment to the Apache open source ecosystem, utilising only open source products with minimal extra patching. Uses Ambari for installing and managing clusters, and Cloudbreak for installing in cloud environments. Free to use with commercial support available.
MapR Converged Data PlatformA data platform that provides Hadoop compatibility (via YARN and the MapR-FS HDFS compatible API), NoSQL and streaming data storage via MapR-DB and MapR-ES, and a bundle of open source Hadoop projects via the MapR Ecosystem Pack. Available in free and commercial editions.
Syncfusion Big Data PlatformDistribution for Windows, Linux and Azure - https://www.syncfusion.com/products/big-data

See also our comparison of the major commercial Hadoop distributions.

Hadoop Cloud Offerings

The following are cloud based Hadoop service offerings, supporting the programmatic provisioning and management of Hadoop clusters. Many also provide higher level APIs that allow for submission and management of individual Hadoop jobs, with some services allowing clusters to be automatically provisioned to execute a job and then terminated afterwards.

Amazon EMRHadoop as a service, with support for a wide range of Hadoop technologies and the ability to programmatically execute Hadoop jobs and dynamically provision clusters to execute these
Azure HDInsightHadoop service based on HDP
Google Cloud DataprocHadoop service, with support for MapReduce, Spark, Pig and Hive, and the ability to programatically submit and manage jobs
Qubole Data ServiceHadoop managed service running on AWS, Azure and Oracle Cloud
Cloudera AltusPlatform for accessing individual CDH capabilities as services, with the first capabilities supported being the execution of Spark, MapReduce or Hive (over MapReduce or Spark) jobs using managed CDH clusters on AWS cloud infrastructure over data in Amazon S3
IBM BigInsights on Cloudhttps://www.ibm.com/analytics/in/en/technology/cloud-data-services/biginsights-on-cloud/
Oracle Big Data Cloud ServiceBased on Cloudera - https://cloud.oracle.com/bigdata
SAP Cloud Platform Big Data Services (previously Altiscale)https://cloudplatform.sap.com/capabilities/data-storage/big-data.html
RackspaceBased on Hortonworks HDP - https://www.rackspace.com/big-data

Hadoop Hardware Appliances

Teradata Appliance for Hadoophttp://www.teradata.com/products-and-services/appliance-for-hadoop
Oracle Big Data Appliancehttps://www.oracle.com/engineered-systems/big-data-appliance/index.html

Non Commercial Options

Apache BigtopAn Apache open source distribution of Hadoop. Packages up a number of Apache Hadoop components, certifies their interoperability using an automated integration test suite, and packages them up as RPMs/DEBs packages for most flavours of Linux.
OpenStack SaharaAllows provisioning of Hadoop on OpenStack - https://docs.openstack.org/developer/sahara/
HopsA distribution based on Hops HDFS and Hops YARN which use a distributed MySQL database for metadata to increase performance and scalability, available as a cloud or on premises offering - http://www.hops.io

Historical / Legacy Options

The following are either no longer available, or are now simply re-badged versions of other distributions:

Intel Distribution for Apache HadoopFocused on optimisations for Intel processors, SSD disks and networking kit; ceased when Intel invested into Cloudera - see announcement
Pivotal HDPivotal has now partnered with Hortonworks - see announcement
IBM InfoSphere BigInsightsIBM has now partnered with Hortonworks - see announcement

Blog Posts