Apache Hive edit

Technology that supports the exposure of data in Hadoop as structured tables and the execution of analytical SQL queries over these. Consists of a number of distinct components (that we treat as sub-projects) including Hive Metastore (stores the definitions of the structured tables), Hive Server (supports the execution of analytical SQL queries as MapReduce, Spark or Tez jobs) and HCatalog (allows MapReduce and Pig jobs to read and write Hive tables). First released by Facebook as an Hadoop contrib module in September 2008, becoming an Hadoop sub-project in November 2008, and a top level Apache project in September 2010, following a first official stable release (0.3) in April 2009. Java based, under active development from a number of large commercial sponsors, with commercial support available as part of most Hadoop distributions.

Technology Information

Other Names	Hive
Vendors	The Apache Software Foundation
Type	Commercial Open Source
Last Updated	August 2018 - v3.1

Sub-projects

Apache Hive > HCatalog	Libraries for MapReduce and Pig to read and write data to and from Hive tables, albeit with some limitations. Also supports a CLI for querying and updating the Hive Metastore, however this doesn't support the full range of Hive DDL commands. Includes WebHCat, a REST API over the HCatalog CLI that also supports the execution of MapReduce, Pig, Hive and Sqoop jobs. Donated to the Apache foundation by Yahoo in March 2011, had WebHCat folded in in July 2012, graduating as a top level project in February 2013, but then almost immediately was folded into Hive in March 2013 as part of the Hive 0.11 release. Has seem limited development since this time.
Apache Hive > Hive Metastore	A metadata service that allows structured tables to be defined over files in HDFS (and also HBase or Accumulo), providing an API that allows the metadata to be queried and updated by other tools including Impala, Spark SQL or RecordService. Supports partitioned and clustered tables, as well as complex field types such as arrays, maps and structs. Backed by a relational database (either MySQL, Postgres and Oracle). Part of the original Hive code base.
Apache Hive > Hive Server	Supports the execution of SQL queries over data in HDFS based on tables defined in the Hive Metastore, as well as DDL to query and update the Hive Metastore. Focus is on analytical (OLAP) use cases, with some support for batch updates to data. Originally executed queries as MapReduce jobs, but significant investment from has seen support for executing queries as Spark and as Tez jobs, with work underway to support sub second query times using Tez (Hive LLAP). Recent changes have also seen it achieve significant SQL compliance, with support for SQL:2011 analytical functions on-going. Accepts queries over an API with JDBC and ODBC drivers available, and includes Beeline, a command line JDBC client. Technically referred to as Hive Server 2, and was introduced in Hive 0.11 as a replacement for the original Hive Server to address a number of concurrency and security issues.

Related Technologies

Is packaged by

Apache Bigtop, Hortonworks Data Platform, Cloudera CDH, MapR Expansion Pack, Cloudera Altus Data Engineering, Amazon EMR, Google Cloud DataProc, Qubole Data Service

Release History

version	release date	release links	release comment
2.2	2017-07-25	announcement
2.3	2017-07-17	announcement
3.0	2018-05-21	announcement	Support for Hadoop 3; materialized views
3.1	2018-07-30	announcement

News

http://hive.apache.org/downloads.html - details of new releases
http://blog.cloudera.com/blog/category/hive/ - Cloudera Hive News
http://hortonworks.com/blog/category/hive/ - Hortonworks Hive News

Blog Posts

Core Hadoop Technologies (pt1) 2017-01-06 Technologies Flume HBase Hive Apache HCatalog Hive Metastore Hive Server Peter