The Mid Week News - 16/08/2017 edit
Bumper week this week given we’ve been off for a while - let’s crack on…
Technology updates (details are on the relevant technology pages):
- The Apache Parquet C++ library has hit 1.2
- Confluent Open Source and Enterprise have hit 3.3
- Apache Ignite is up to 2.1
- Apache Drill is up to 1.11
- Apache Tez has hit 0.9
- Apache Hive has seen 2.2 and 2.3 releases, with the 2.3 coming first. No idea what’s going on hear - if you can enlighten me please do!
- There’s are also new links added to the CDH 5.12, Cloudera Director 2.5, Cloudera Data Science Workbench 1.1, Hue 4, Kafka 0.11, Ignite 2.0 and Ambari 2.5 release entries from recent blog posts exploring the new functionality
Technology news:
- Apache Pulsar (homepage has entered Apache incubation - looks like another potential Kafka alternative, this time from Yahoo. We’ll try and take a look at this next week.
- It looks like there are plans to split the Hive Metastore off into it’s own top level Apache project
- Cloudera’s thoughts (part 1) on the role Analytical Search capabilities play in big data analytics
- From DB-Engines, thoughts on time series databases
- The Confluent blog has been busy, with a bunch of interesting posts
- An introduction to creating a flows using Kafka Streams
- An excellent introduction to the architectural principles behind Kafka
- Thoughts on the use of Kafka as the single source of truth (including history) of your data
- From The Morning Paper, an article on how machine learning can optimise database tuning, which probably speaks to the complexity of tuning databases as much as anything else
- AWS Glue is now generally available
- Another useful introduction to Kafka Streams
- From Cloudera, a post on monitoring Solr with Cloudera Manager
- And another from Cloudera, querying Impala from R
- And one more from Cloudera, detailing S3Guard, providing consistency when running HDFS over Amazon S3