The Mid Week News 17/07/2019 edit
News news news again. Remember, you can get daily news updates from our twitter feed (@OnDataEng)…
Technology updates (details are on the relevant technology pages):
- Apache Kudu 1.10 is out, with table backup/restore, metadata sync with Hive Metastore, and native fine-grained authentication via ApacheSentry
- Data Lifecycle Manager (part of the DataPlane Platform) is up to 1.5 if you’re looking for a tool to replicate HDFS and Hive data between clusters
- Data Analytics Studio (part of the DataPlane Platform) is up to 1.3 is you’re looking for a tool to run and diagnose performance issues with Hive queries
Other technology news:
- Cloudera have announced the licensing model for the new company - TLDR, they’re sticking with Apache and AGPL licences, sticking with the Apache foundation, and the ex-Cloudera commercial components will all be open sourced - link
- From the Starburst blog, part sales pitch, but a good case for separation of storage and compute and keeping your architecture open - link
- LinkedIn have open sourced Brooklin, their tool for replicating streaming data between streaming data stores and/or databases - we’ve added this to our list of Data Ingestion technologies - link
- Databricks Runtime 5.5 is out - link
- This looks really interesting - from Datanami, an intro to Dagster, an open source tool for creating data applications using a functional paradigm, with support for a range of languages and integrations out of the box - link