Apache Sqoop edit  

Specialist technology for moving bulk data between Hadoop and structured (relational) databases. Command line based, with the ability to import and export data between a range of databases (including mainframe partitioned datasets) and HDFS, Hive, HBase and Accumulo. Executes as MapReduce jobs, supports parallel partitioned unloads, writing to Avro, Sequence File, Parquet and text files, incremental imports and saved jobs that can be shared via a simple metadata store. An Apache project, started in May 2009 as an Hadoop contrib module, migrating to a Cloudera GitHub project in April 2010 (with a v1.0 release shortly after), before being donated to the Apache foundation in June 2011, graduating in March 2012. The last major release (v1.4) was in November 2011, with only minor releases since then. However in January 2012 a significant re-write was announced as part of a proposed v2.0 release to address a number of usability, security and architectural issues. This will introduce a new Sqoop Server and Metadata Repository, supporting both a CLI and web UI, centralising job definitions, database connections and credentials, as well as enabling support for a wider range of connectors including NoSQL databases, Kafka and (S)FTP folders. Java based, with commercial support available as part of most Hadoop distributions.

Technology Information

Other NamesSqoop
VendorsThe Apache Software Foundation
TypeCommercial Open Source
Last UpdatedJanuary 2017 - v1.4

Related Technologies

Is packaged byApache Bigtop, Hortonworks Data Platform, Cloudera CDH, MapR Expansion Pack, Amazon EMR, Qubole Data Service

News

Blog Posts