Graph Analytics edit
Our list of and information on commercial and open source graph analytics engines and databases, including Giraph, Hama, GraphX, Flink, JanusGraph, Stardog, TinkerPop and alternatives to these. Technologies that support the execution of analytics over graph data in an external underlying store (often HDFS), generally over the entire graph database to generate aggregated results, identify data of interest, or to enrich the graph. Processing is often based on a BSP (Bulk Synchronous Processing) model made famous by Pregel, the model created by Google to run their PageRank algorithm. The Morning Paper blog from Adrian Colyer has a good introduction to Pragel, and the original paper is also available online The following technologies all implement a graph analytics engine over external data, generally using a BSP execution model Apache TinkerPop provides support for running analytics over graphs from Gremlin using an external query engine (or GraphComputer) - see http://tinkerpop.apache.org/docs/3.3.0/reference/#graphcomputer for further information. TinkerPop includes GraphComputer adapters for Spark and Giraph out of the box, with the analytics generally running on an external cluster reading the data on job startup from the source graph database via TinkerPop. Not all graph databases that support TinkerPop support the execution of graph analytics - those that do are listed as OLAP databases at http://tinkerpop.apache.org/#graph-systems. See also our Analytical Databases page for databases that support graph analytics, including specialist graph analytical databasesCategory Definition
Pregel
Analytics Engines
Giraph An iterative, highly scalable graph processing system based on Pregel and built over MapReduce Hama A general purpose BSP (Bulk Synchronous Parallel) processing engine inspired by Pregel and DistBelief that runs over Mesos or YARN. Spark/GraphX Spark library for processing graphs and running graph algorithms Flink Gelly Graph processing API and library on top of Apache Flink Gaffer Open source project for running analytics over very large graphs in HDFS, Accumulo or HBase - https://github.com/gchq/Gaffer Apache TinkerPop
Analytical Databases
Blog Posts