The Mid Week News - 07/03/2018 edit
It’s time for the news again…
Technology updates (details are on the relevant technology pages):
- Apache Kylin has hit 2.3
- Apache Spark has also hit 2.3
Other technology news:
- Druid has been donated to the Apache Incubator - proposal; incubator page
- Elastic have announced that they’ll be open sourcing their Elastic X-Pack as of Elastic 6.3. The code will be moved into the public repos for their other products (but under the Elastic EULA), and the free elements will be pre-bundled with those products rather than requiring a separate download - accouncement; details; Datanami view
- An excellent article on Data Warehouse Automation - we’ll get to talking more about this soon - link
- MapR have announced “MapR Data Fabric for Kubernetes” - persistent storage for containers running on Kubernetes - announcement; homepage; Datanami view
- Hortonworks have blogged about what’s new in Cloudbreak 2.4 - link
- The latest Hortonworks blog post on HDF 3.1 is up, this time on the MiNiFi C++ agent - link
- AWS have published their best practice for running Kafka on AWS - link
- Datanami have covered Cloudera’s announcement of Altus Data Science (R and Python-based machine learning workloads based on their Data Science Workbench) coming to beta soon, with an operational database build on HBase coming as the fourth package in the future - link
- Again from Datanami, a report that Streamlio is claiming up to 150% performance advantage of Apache Pulsar vs Apacke Kafka as a Streaming Data Store - link
- From ZDNet, this is a well worth a read if you have an interest in Graph Databases or RDF Databases that’s dense with information - link
- Cypher (the open source Graph query language from Neo4J) now has adapters to allow Cypher jobs to be run over Spark and TinkerPop Gremlin compatible databases
- There’s a SPARQL Gremlin bridge, allowing you to run SPARQL queries over TinkerPop Gremlin compatible databases
- Amazone Neptune (which supports both Gremlin and SPARQL), is apparently built on BlazeGraph
- There’s a new massively parallel distributed graph database from Cambridge Semantics (CS) called AnzoGraph, which they compare to TigerGraph
- Looks like I missed the donation of this to the Apache Foundation, but Apache Hivemall is a scalable machine learning library implemented as Hive UDFs/UDAFs/UDTFs - home page
LinkedIn have proposed DrElephant to the Apache Foundation - their performance monitoring and tuning service for jobs and workflows that run on Apache Hadoop and Apache Spark - proposal