Technology Categories edit create
A catalogue of data transformation, data platform and other technologies used within the Data Engineering space, organised by category Databases that primarily support transactional use creates (creating, finding, updating and deleting records, often referred to as OLTP or online transaction processing), which underpin business or operational systems and generally support limited analytical use cases. NOTE: These are of interest to us as potential sources of data, and although we have lists of these technologies on this site, they are not of primary interest and therefore we’re unlikely to have technology pages for any of the technologies that fall under these categories. Databases that primarily support analytical use cases (aggregations, machine learning and other algorithms that run over large volumes of data, often referred to as OLAP or online analytical processing), and which provide integrated storage and query capabilities in a single technology. Technologies that support the storage of data, but with no (or limited) capabilities to analyse or exploit the data being stored. Technologies that support the execution of queries or analytics over data in one or more external database or data storage technologies. Originally targeted at exploiting unprepared data at enormous scale, they are now starting to support capabilities that allow them to compete with analytical databases. The separation of storage and query engine provides flexibility, for example to exploit data in a much rawer state, or to exploit prepared data using multiple tools. Technologies that support the integration of data from multiple sources without data movement or transformation. Technologies that support the acquisition, ingest and processing of dataOperational Databases
Relational Databases Databases that focus on operational (OLTP) use cases Key-Value Databases NoSQL databases for storing data values indexed by a single key Document Databases NoSQL databases for storing data as structured documents (e.g. JSON / XML) Graph Databases Graph databases that focus on operational use cases RDF Databases RDF (Resource Description Framework) databases Multi Model Databases Databases that support multiple use cases (e.g. relational, document and graph) Analytical Databases
Analytical Databases Databases that focus on analytical (OLAP) use cases, including relational, graph and machine learning capabilities Analytical Search Search technologies that also support analytical capabilities such as aggregations, graph analytics and machine learning NoSQL Wide Column Stores Sparse multi-dimensional key value stores that support scan/iterate operators Time Series Databases Databases optimised for storing very large numbers of metrics and allowing these to be aggregated and analysed Hadoop
Hadoop Distributions Options for deploying an Apache Hadoop ecosystem Hadoop Supporting Components Technologies for managing and monitoring an Apache Hadoop installation Data Storage
Hadoop Compatible Filesystems A parallel distributed filesystem that implements the Hadoop FileSystem API and conforms to the Hadoop Compatible Filesystem specification, allowing it to be used in place of HDFS Object Stores Storage solutions whereby data is stored without any concept of folders or organisational structure, instead being referenced by a unique identifier, allowing for massively parallel and scalable solutions. Streaming Data Stores Technologies for the persistent storage of continuous streams of data, with data access based on a publish/subscribe model. Data Storage Formats Libraries that support the storage of data on disk for data storage, real-time or batch analytics Schema Registries Tools that support the definition, management and serving of data schemas for use in the serialisation and de-serialisation of data, primarily for Streaming Data Stores Query / Analytics Engines
Query Engines Engines that allow queries expressed in a high level language (often SQL) to be run over one or more underlying data stores or databases, often including Hadoop (aka SQL on Hadoop) Graph Analytics Engines that allow graph analytics to be run over data in an underlying data store (generally HDFS) Data Integration Engines
Data Virtualization Technologies that allow data in multiple source databases to be accessed as a single integrated virtual database Enterprise Semantic Graphs Technologies that allow data in multiple sources to be accessed as a single integrated RDF graph model Data Processing
Data Ingestion Specialist tools designed to acquire and ingest data into an analytical platform ready for analysis or for further transformation to support analysis. Infrastructure
Compute Cluster Managers Technologies for managing the execution of jobs across a general purpose compute cluster Blog Posts