In this edition of our Big Data Roundup, we've got news from LinkedIn, AtScale, Hortonworks, and more. Plus, we have a look at a new project that could help you apply artificial intelligence (AI) to find the movie that you've been looking for -- even if you can't remember its title.
First, let's look at news from Hortonworks, one of the three big Hadoop distribution companies. On March 1 the company announced what it says is a comprehensive strategy that includes a new distribution approach for Apache Hadoop and updates to other Apache initatives for data streaming. These advancements go across the company's Connected Data Platforms including Hortonworks Data Platform (HDP) and Hortonworks DataFlow (HDF).
Starting with the release of Hortonworks Data Platform 2.4 on March 1 and including all future releases, the company is committing to two new release cadences. Core Apache Hadoop components -- including HDFS, MapReduce, and Yarn, plus Apache Zookeeper -- will be updated annually and aligned with the ODPi consortium, the company said.
In addition, extended services, including Spark, Hive, HBase, Ambari, and others, which run on top of the Core, will be released continually throughout the year to "match the pace of innovation occurring within each project team in the community," Hortonworks said in a statement.
The company also announced advancements to its DataFlow 1.2 streaming analytics for real-time streaming platform. HDF 1.2 now integrates streaming analytics engines Apache Kafka and Apache Storm, according to Hortonworks.
Finally, the company announced a joint collaboration with Hewlett-Packard Enterprise to optimize enterprise Spark performance.
LinkedIn's Newest Open Source Contribution
Meanwhile, career social media network LinkedIn has made another one of its data projects open source. The company has announced that its WhereHows technology, which it developed for internal use as a way to collect and manage the metadata for its gigantic network of data, will now be available under the Apache open source license.
LinkedIn is no stranger to contributing projects to the Apache Software Foundation. Apache Kafka is another important project developed by LinkedIn and contributed to open source.
Gartner Magic Quadrant For Data Warehouse
Speaking of the Apache Software Foundation, the big three Apache Hadoop distribution companies for the first time all made it into the Gartner Magic Quadrant for Data Warehouse and Data Management Solutions for Analytics. Cloudera, Hortonworks, and MapR were all clustered in the Visionaries Quadrant. The data warehouse report for 2016 was the first that included Hortonworks. Last year, Cloudera and MapR appeared in the Challengers Quadrant.
BI Meets Big Data
And while big data has offered the exciting promise of new data sets driving new insights, traditional business intelligence tools have not been able to directly connect to big data infrastructure systems such as Hadoop. Startup AtScale set out to change that a few years ago by creating a way for business analysts to use their traditional tools to query Hadoop-based data. And after pioneering this approach, the company has released a benchmark that reveals which big data engines are most appropriate for which business intelligence tasks. Not all workloads are created equal, the report finds, with some tools excelling at small data and fast queries while others offer greater stability for a large number of concurrent users.
Finally, this week Engadget reports on a new AI project that can help you find that movie or video that you've been looking for, even if you've forgotten the title. A project out of Finland called Val.ai can identify over a thousand qualities of a movie from any video stream automatically, including emotions, locations, and specific objects, the site reports. And there's a Web-based version of it, too, although Engadget reports imperfect results.