The Mid Week News - 06/12/2017 edit
Right - time for your weekly updates on new software releases and interesting new information and posts, with a big dump from AWS re:Invent this week…
Technology updates (details are on the relevant technology pages):
- Apache Beam has hit 2.2
- Druid has hit 0.11
Other technology news:
- After the Azure product dump a few weeks ago, it’s Amazon’s turn via AWS re:Invent:
- Amazon Neptune - a graph/RDF database as a service with support for TinkerPop Gremlin and RDF SPARQL - announcement and blog
- Amazon SageMaker - service for building, training and deploying machine learning at scale - announcement and blog
- AWS Fargate - provisioning of containers on AWS without managing servers or clusters - announcement and blog
- Elastic Kubernetes Service (EKS) - Kubernetes as a service - announcement and blog
- S3 Select and Glacier Select - retrieve subsets of stored objects by running select queries server side - S3 announcement, Glacier announcement and blog
- See also summaries from The Register, from InfoQ, and the motherlist of blog posts relating to re:Invent from Amazon
- From Cloudera, infrastructure considerations for deploying CDH - link
- MapR have posted their thoughts on Apache Drill as part of the MapR Converged Data Platform, and their view of it as “a unified SQL access layer across files, tables and streams”, along (of course) with some new benchmarks - link
- An interesting post of MariaDB AX, the data warehouse solution from MariaDB that’s built on MariaDB ColumnStore, on bulk and streaming ingestion of data - link.
- AtScale now runs over Amazon RedShift - link
- Confluent have a new blog post on Confluent Platform 4.0 (Confluent Open Source and Confluent Enterprise) - link
- From ZDNet, an interview on Apache Flink and thoughts on the wider ecosystem - link
- From Google, another post on the separation of storage and compute with BigQuery - link
- Crail has been accepted to the Apache Incubator - we last saw this in October when it was submitted, so that’s a pretty quick turn around. As a recap, this looks like a high performance distributed and tiered (in memory, flash and disk) storage layer for temporary data that provides memory, storage and network access that bypasses the JVM and OS, and with integration to Spark (as a custom Spark Suffler that improves sort performance by a factor of five) and Hadoop (via an HDFS adaptor).