The Mid Week News 17/07/2019 edit  

News news news again. Remember, you can get daily news updates from our twitter feed (@OnDataEng)…

Technology updates (details are on the relevant technology pages):

  • Apache Kudu 1.10 is out, with table backup/restore, metadata sync with Hive Metastore, and native fine-grained authentication via ApacheSentry
  • Data Lifecycle Manager (part of the DataPlane Platform) is up to 1.5 if you’re looking for a tool to replicate HDFS and Hive data between clusters
  • Data Analytics Studio (part of the DataPlane Platform) is up to 1.3 is you’re looking for a tool to run and diagnose performance issues with Hive queries

Other technology news:

  • Cloudera have announced the licensing model for the new company - TLDR, they’re sticking with Apache and AGPL licences, sticking with the Apache foundation, and the ex-Cloudera commercial components will all be open sourced - link
  • From the Starburst blog, part sales pitch, but a good case for separation of storage and compute and keeping your architecture open - link
  • LinkedIn have open sourced Brooklin, their tool for replicating streaming data between streaming data stores and/or databases - we’ve added this to our list of Data Ingestion technologies - link
  • Databricks Runtime 5.5 is out - link
  • This looks really interesting - from Datanami, an intro to Dagster, an open source tool for creating data applications using a functional paradigm, with support for a range of languages and integrations out of the box - link