The Mid Week News - 03/05/2017 edit
Technologies Apache Hadoop Apache HBase Apache Flink Hortonworks SmartSense Hortonworks Data Cloud for AWS Hortonworks Data Platform for Windows Apache Kudu Cloudera Manager Cloudera Navigator Cloudera Altus Director Elasticsearch Peter
So, I failed at the first hurdle in trying to do this weekly, however let’s carry on regardless.
This week - new products from Cloudera and Hortonworks, a bunch of Hortonworks and Cloudera releases that got missed last time, plus a collection of blog posts I’ve been collecting for a while.
In terms of the new products from Cloudera and Hortonworks, we’ve seen Cloudera Data Science Workbench and Apache Metron formally released recently. I’m aiming to do tech summaries for both this week and we’ll look at these a bit closer.
Some Hortonworks updates that we missed last time - Hortonworks Data Cloud for AWS has seen a new release to 1.14, Hortonworks SmartSense got a bump to 1.4, and it looks like HDP for Windows got discontinued whilst I wasn’t looking - 2.4 was the final version!
And on the Cloudera front, Cloudera Manager, Cloudera Navigator and Cloudera Director have all seen version bumps as part of the CDH 5.11 release
And finally, some assorted blog posts that have caught my attention recently:
- Cloudera have released their last blog post on how much faster Impala is than anything else. Expect one from Hortonworks shortly that shows that Hive LLAP is actually the fastest.
- From the ever excellent “The Morning Paper”, a summary of a research paper on HopFS, a version of HDFS where the in-memory metadata database in the Name Node is replaced with a distributed database, allowing it to scale to much larger numbers of files and dramatically increase throughput.
- An interesting interview with the CTO of Cloudera
- Merv Adrian’s latest Hadoop tracker is out. I’m not sure you can directly compare the component versions in Hadoop distributions given how much each vendor pulls patches forward, but it’s an interesting analysis never-the-less.
- The Flink blog has been busy with a summary of the Flink ecosystem and a comparison of Flink to Spark and MapReduce
- Hadoop has failed us
- Some analysis on Cloudera’s strategy
- Tech deep dives into HBase In-Memory Compaction and Kudu read write paths
- And Elasticsearch is coming to Google’s Cloud Platform
- And last but not least - Matt Turck’s monster 2017 Big Data Landscape - essential reading