Google Cloud DataProc edit
Service for dynamically provisioning Hadoop clusters on Google Compute Engine based on a single standard set of Hadoop services. Supports selection of virtual machines (including custom machine types and machines with GPUs), usage of custom VM images, a claimed cluster startup time of less than 90 seconds, local storage and HDFS filesystem, programmatic execution of jobs, workflows (parameterisable operations that create clusters, run jobs and then delete the cluster), manual and automatic scaling, initialisation actions (to install extra services or run scripts, with a set of open source actions available), optional components (automatic addition of extra services), automatic deletion of clusters (based on time, usage or idleness), integration with Stackdriver Logging and Monitoring and encryption of data in HDFS and Cloud Storage. Manageable via the Google Cloud Console Web UI and SDK plus an RPC and REST API. Priced an an hourly rate (charged per second) based on the specification of the VMs being used, which is in addition to any Compute Engine or Persistent Disk charges. See Google Cloud Platform updates Technology Information
Other Names Google DataProc, DataProc Type Commercial Last Updated October 2018 - v1.3 Related Technologies
Packages Apache Hadoop, Apache Hive, Apache Pig, Apache Spark, Apache Tez Links
News