The Mid Week News - 07/03/2018 edit  

It’s time for the news again…

Technology updates (details are on the relevant technology pages):

Other technology news:

  • Druid has been donated to the Apache Incubator - proposal; incubator page
  • Elastic have announced that they’ll be open sourcing their Elastic X-Pack as of Elastic 6.3. The code will be moved into the public repos for their other products (but under the Elastic EULA), and the free elements will be pre-bundled with those products rather than requiring a separate download - accouncement; details; Datanami view
  • An excellent article on Data Warehouse Automation - we’ll get to talking more about this soon - link
  • MapR have announced “MapR Data Fabric for Kubernetes” - persistent storage for containers running on Kubernetes - announcement; homepage; Datanami view
  • Hortonworks have blogged about what’s new in Cloudbreak 2.4 - link
  • The latest Hortonworks blog post on HDF 3.1 is up, this time on the MiNiFi C++ agent - link
  • AWS have published their best practice for running Kafka on AWS - link
  • Datanami have covered Cloudera’s announcement of Altus Data Science (R and Python-based machine learning workloads based on their Data Science Workbench) coming to beta soon, with an operational database build on HBase coming as the fourth package in the future - link
  • Again from Datanami, a report that Streamlio is claiming up to 150% performance advantage of Apache Pulsar vs Apacke Kafka as a Streaming Data Store - link
  • From ZDNet, this is a well worth a read if you have an interest in Graph Databases or RDF Databases that’s dense with information - link
    • Cypher (the open source Graph query language from Neo4J) now has adapters to allow Cypher jobs to be run over Spark and TinkerPop Gremlin compatible databases
    • There’s a SPARQL Gremlin bridge, allowing you to run SPARQL queries over TinkerPop Gremlin compatible databases
    • Amazone Neptune (which supports both Gremlin and SPARQL), is apparently built on BlazeGraph
    • There’s a new massively parallel distributed graph database from Cambridge Semantics (CS) called AnzoGraph, which they compare to TigerGraph
  • Looks like I missed the donation of this to the Apache Foundation, but Apache Hivemall is a scalable machine learning library implemented as Hive UDFs/UDAFs/UDTFs - home page
  • LinkedIn have proposed DrElephant to the Apache Foundation -their performance monitoring and tuning service for jobs and workflows that run on Apache Hadoop and Apache Spark - proposal