The Mid Week News - 10/01/2018 edit
It’s the first news back after the Christmas break, so brace yourself - it’s a massive bumper jam packed edition…
Technology updates (details are on the relevant technology pages):
- The big one this week is Apache Hadoop 3.0 - there’s links to the release note on our Hadoop page and some links below to some commentry
- Elasticsearch has hit 6.1, along with X-Pack and Elasticsearch Hadoop
- Apache Solr 7.2 is out
- Apache HBase 1.4 is out
- Apache Drill is up to 1.12 - Kafka support is interesting
- Apache Knox has hit 0.14
- Apache Arrow has hit 0.8
- Pravega - the Kafka challenger - has hit 0.2
- MiNiFi has seen 0.3 releases of it’s Java version
- Cloudbreak has seen it’s second 2.x technology preview release - 2.2
Other technology news:
- Both ZDNet and Datanami have posts on Hadoop 3.0 and what the roadmap past this looks like - ZDNet; Datanami
- Blog posts have appeared for Kudu 1.6 and Greenplum 5.3 that have been added to their technology pages. Greenplum is looking to move to a fully containerised deployment model - which is interesting.
- Azure HDInsight has seen a big price reduction and a bunch of new announcements - link; ZDNet commentary
- An excellent article from Ehud Kaldor and SwiftStack on the differences between NFS and Object Storage - link
- Hortonworks have published a set of pre-canned streaming analytics projects using HDP and HDF, including Ad Serving, Clickstream Analysis and Predictive Maintenance - link
- A couple of old Databricks announcements we didn’t cover at the time for some reason
- Databricks Unified Analytics Platform - Databricks runtime + interactive collaborative notebooks and dashboards + production job / notebook scheduling + enterprise security - homepage; blog
- Databricks Delta - a service over cloud blog stores like S3 that adds ACID transactions and support for automatic data indexing - homepage; blog
- And some thoughts from ZDNet - Spark in the cloud; Databricks strategy
- Merv Adrian’s latest Hadoop tracker is up detailing the component versions used by the major Hadoop vendors - link
- If you’ve got some time for reading, AtScale have a list of their top 10 posts and articles from 2017 - link
- From ZDNet, their thoughts on big data in 2018 and the move to the cloud - link
- The excellent db-engines site have announced their database of the year - link
- DZone have published a Refcard for Kafka covering a whole pile of useful getting started information - link
- A good write up of the features in Elasticsearch 6.0 from Logz.io - link
- Are you running Kafka - we have a couple of posts this week from NewRelic and Confluent on monitoring it - NewRelic; Confluent
- Azure Blob Storage now supports an archive level tier - link
- A deep drive into the YARN capacity scheduler from Hortonworks - link
- From Apache Flink - 2017 in review and plans for 2018 - link
- dataArtisans have responded to the Databricks Spark Streaming vs Flink benchmark - link
- Apache Mnemonic and Trafodion have graduated from the Apache Incubator - link; link
- The Apache Nifi project has released the first (0.1) version of the NiFi registry for the configuration management of flows - link
- A write-up from ZDNet on Streamsets - link
- It’s an old article, but still interesting - ZDNet looked at graph vs rdf databases - link
- By comparison this is ancient (from 2015), but looks like a really good intro the the HBase architecture from MapR - link
- At the risk of this becoming a ZDNet fest - their views on big data in 2017 and 2018 - link
- An update from the Pravega blog on their architecture and design principles - link
- For the deeply technical - how to build a distributed log (streaming data store) - link
- And last but not least, from Sonra - dimensional modelling on Hadoop - link