The Mid Week News - 10/01/2018 edit  

It’s the first news back after the Christmas break, so brace yourself - it’s a massive bumper jam packed edition…

Technology updates (details are on the relevant technology pages):

Other technology news:

  • Both ZDNet and Datanami have posts on Hadoop 3.0 and what the roadmap past this looks like - ZDNet; Datanami
  • Blog posts have appeared for Kudu 1.6 and Greenplum 5.3 that have been added to their technology pages. Greenplum is looking to move to a fully containerised deployment model - which is interesting.
  • Azure HDInsight has seen a big price reduction and a bunch of new announcements - link; ZDNet commentary
  • An excellent article from Ehud Kaldor and SwiftStack on the differences between NFS and Object Storage - link
  • Hortonworks have published a set of pre-canned streaming analytics projects using HDP and HDF, including Ad Serving, Clickstream Analysis and Predictive Maintenance - link
  • A couple of old Databricks announcements we didn’t cover at the time for some reason
    • Databricks Unified Analytics Platform - Databricks runtime + interactive collaborative notebooks and dashboards + production job / notebook scheduling + enterprise security - homepage; blog
    • Databricks Delta - a service over cloud blog stores like S3 that adds ACID transactions and support for automatic data indexing - homepage; blog
    • And some thoughts from ZDNet - Spark in the cloud; Databricks strategy
  • Merv Adrian’s latest Hadoop tracker is up detailing the component versions used by the major Hadoop vendors - link
  • If you’ve got some time for reading, AtScale have a list of their top 10 posts and articles from 2017 - link
  • From ZDNet, their thoughts on big data in 2018 and the move to the cloud - link
  • The excellent db-engines site have announced their database of the year - link
  • DZone have published a Refcard for Kafka covering a whole pile of useful getting started information - link
  • A good write up of the features in Elasticsearch 6.0 from Logz.io - link
  • Are you running Kafka - we have a couple of posts this week from NewRelic and Confluent on monitoring it - NewRelic; Confluent
  • Azure Blob Storage now supports an archive level tier - link
  • A deep drive into the YARN capacity scheduler from Hortonworks - link
  • From Apache Flink - 2017 in review and plans for 2018 - link
  • dataArtisans have responded to the Databricks Spark Streaming vs Flink benchmark - link
  • Apache Mnemonic and Trafodion have graduated from the Apache Incubator - link; link
  • The Apache Nifi project has released the first (0.1) version of the NiFi registry for the configuration management of flows - link
  • A write-up from ZDNet on Streamsets - link
  • It’s an old article, but still interesting - ZDNet looked at graph vs rdf databases - link
  • By comparison this is ancient (from 2015), but looks like a really good intro the the HBase architecture from MapR - link
  • At the risk of this becoming a ZDNet fest - their views on big data in 2017 and 2018 - link
  • An update from the Pravega blog on their architecture and design principles - link
  • For the deeply technical - how to build a distributed log (streaming data store) - link
  • And last but not least, from Sonra - dimensional modelling on Hadoop - link