2017 edit

2017-12-06 Peter

Right - time for your weekly updates on new software releases and interesting new information and posts, with a big dump from AWS re:Invent this week…

Technology updates (details are on the relevant technology pages):

Apache Beam has hit 2.2
Druid has hit 0.11

Other technology news:

After the Azure product dump a few weeks ago, it’s Amazon’s turn via AWS re:Invent:
- Amazon Neptune - a graph/RDF database as a service with support for TinkerPop Gremlin and RDF SPARQL - announcement and blog
- Amazon SageMaker - service for building, training and deploying machine learning at scale - announcement and blog
- AWS Fargate - provisioning of containers on AWS without managing servers or clusters - announcement and blog
- Elastic Kubernetes Service (EKS) - Kubernetes as a service - announcement and blog
- S3 Select and Glacier Select - retrieve subsets of stored objects by running select queries server side - S3 announcement, Glacier announcement and blog
- See also summaries from The Register, from InfoQ, and the motherlist of blog posts relating to re:Invent from Amazon
From Cloudera, infrastructure considerations for deploying CDH - link
MapR have posted their thoughts on Apache Drill as part of the MapR Converged Data Platform, and their view of it as “a unified SQL access layer across files, tables and streams”, along (of course) with some new benchmarks - link
An interesting post of MariaDB AX, the data warehouse solution from MariaDB that’s built on MariaDB ColumnStore, on bulk and streaming ingestion of data - link.
AtScale now runs over Amazon RedShift - link
Confluent have a new blog post on Confluent Platform 4.0 (Confluent Open Source and Confluent Enterprise) - link
From ZDNet, an interview on Apache Flink and thoughts on the wider ecosystem - link
From Google, another post on the separation of storage and compute with BigQuery - link
Crail has been accepted to the Apache Incubator - we last saw this in October when it was submitted, so that’s a pretty quick turn around. As a recap, this looks like a high performance distributed and tiered (in memory, flash and disk) storage layer for temporary data that provides memory, storage and network access that bypasses the JVM and OS, and with integration to Spark (as a custom Spark Suffler that improves sort performance by a factor of five) and Hadoop (via an HDFS adaptor).