The Mid Week News 11/09/2019 edit
Apologies - we’ve been off on holiday again, hence the radio silence. But we’re back, and with a big old news bump.
Remember, you can get daily news updates from our twitter feed (@OnDataEng)…
Technology updates (details are on the relevant technology pages):
- Amazon EMR release 5.26 is out, with even better Spark performance
- And Amazon have also announced Amazon EMR 6.0, with support for Hadoop 3.1 and running Spark jobs in Docker containers
- Apache ORC 1.6 is out if you’re looking for columnar data storage on HDFS
- Greenplum 6.0 is finally out if you’re looking for mature shared nothing MPP database
- Apache CarbonData 1.6 is out if you’re looking for indexed storage of data on HDFS with supports for batch inserts and updates
- Version 0.5 of the NiFi Registry is out if you’re looking to configuration manage your NiFi flows
- Version 0.4 of Apache Myriad is out
- Zenko CloudServer has just released version 8.2
Other technology news:
- Are you running an Apache Solr version prior to 5.0 - if so there’s an XML bomb attack - link
- ApacheIoTDB - the Apache open source time series database focusing on IoT use cases has it’s first official release @ 0.8 - link
- From the ever excellent The Morning Paper, a review of a paper that used “the TPC-H benchmark to assess Redshift, Redshift Spectrum, Athena, Presto, Hive, and Vertica to find out what works best and the trade-offs involved” - link
- Elastic Cloud is now available on Azure - link
- Confluent Schema Registry is now available as a cloud service in Confluent Cloud - link
- Looking for an open source object store - Datanami have the latest on MinIO - link
- From Datanami, Cloudera’s Q2 results are better than expected - link
- StreamSets have announced StreamsetsTranformer - a graphical tool for creating Apache Spark pipelines that’s part of their DataOps Platform - link
- Using Google Cloud Storage with Hadoop - Google have a new version of their Cloud Storage Connector for Hadoop out with a bunch of performance improvements and locking for directory modifications - link
- ApacheDolphinScheduler has just been accepted into the Apache Incubator - originally called Easy Scheduler, donated by Analysys, it’s a tool for distributed ETL scheduling - link