Apache Sqoop edit

Specialist technology for moving bulk data between Hadoop and structured (relational) databases. Command line based, with the ability to import and export data between a range of databases (including mainframe partitioned datasets) and HDFS, Hive, HBase and Accumulo. Executes as MapReduce jobs, supports parallel partitioned unloads, writing to Avro, Sequence File, Parquet and text files, incremental imports and saved jobs that can be shared via a simple metadata store. An Apache project, started in May 2009 as an Hadoop contrib module, migrating to a Cloudera GitHub project in April 2010 (with a v1.0 release shortly after), before being donated to the Apache foundation in June 2011, graduating in March 2012. The last major release (v1.4) was in November 2011, with only minor releases since then. However in January 2012 a significant re-write was announced as part of a proposed v2.0 release to address a number of usability, security and architectural issues. This will introduce a new Sqoop Server and Metadata Repository, supporting both a CLI and web UI, centralising job definitions, database connections and credentials, as well as enabling support for a wider range of connectors including NoSQL databases, Kafka and (S)FTP folders. Java based, with commercial support available as part of most Hadoop distributions.

Technology Information

Other Names	Sqoop
Vendors	The Apache Software Foundation
Type	Commercial Open Source
Last Updated	January 2017 - v1.4

Related Technologies

Is packaged by

Apache Bigtop, Hortonworks Data Platform, Cloudera CDH, MapR Expansion Pack, Amazon EMR, Qubole Data Service

News

http://sqoop.apache.org/ - details latest release, and hosts release notes for v1.4.0 onwards
https://blogs.apache.org/sqoop/ - project blog

Apache Sqoop edit

Technology Information

Related Technologies

Links

News

Blog Posts