Microsoft updated its R statistical modeling language product lineup, Yahoo released a massive machine learning data set to the academic community, Baidu released some of its machine learning developments around speech recognition to open source, and IBM acquired a real-time fraud detection and analytics company. We've got those stories and more in this week's big data roundup.
Let's start with Microsoft. It's been a year since the company put a big stake in the ground by acquiring Revolution Analytics, a distributor of the open source R statistical modeling language. Back then, the move was viewed as a way for Microsoft to supplement its growing big data and analytics toolbox as well as to show that it understands the importance of open source. This week, the company announced the rebranding of its R servers and development tools under the Microsoft name, yet it continues its commitment to offering many of those tools for free to the development community.
Meanwhile, another tech company showed that it cares about the development community, too. Yahoo released a massive machine learning data set to the academic research community. This data set includes the surfing and search habits of 20 million anonymous users.
Yahoo's move is designed to be used by researchers for context-aware learning, large-scale learning algorithms, user behavior modeling, and content enrichment. Yahoo said the information includes data about how users interacted with the Yahoo home page, Yahoo News, Yahoo Sports, Yahoo Finance, Yahoo Movies, and Yahoo Real Estate. The data set is available as part of the Yahoo Labs Webscope data-sharing program, a reference library of datasets composed of anonymous user data for non-commercial use.
[ Do people care about data privacy? Find out what they care about even more. Read Pew: Americans Would Trade Privacy For Safety. ]
The research arm of Baidu, which has sometimes been described as the Google of China, has released some of its machine learning software called Warp-CTC under an open source Apache license and posted it on GitHub. Warp-CTC builds on previous algorithms and was developed as Baidu worked on its Deep Speech recognition system that has been shown to work for English and Mandarin. The company said in an FAQ that it is releasing the development to open source because "we want to make end-to-end deep learning easier and faster so researchers can make more rapid progress. … We want to start contributing to the machine learning community by sharing an important piece of code that we created." Baidu said that it expects to release additional open source AI tools in the future.
IBM announced Jan. 15 that it has acquired IRIS Analytics, a privately held company specializing in real-time analytics for combatting payment fraud. IRIS Analytics is focused on the problem of detecting fraud as it is attempted instead of after it has happened. IRIS provides a real-time fraud analytics engine that leverages machine learning to generate rapid anti-fraud models while supporting the creation and modification of ad-hoc models, IBM said. Financial terms of the deal were not disclosed.
Databricks, the company whose founders developed the widely popular big data platform Apache Spark, has announced a series of top management changes. Ion Stoica is leaving his job as CEO and will assume the role of executive chairman. Current VP of engineering and product Ali Ghodsi has been named as CEO. Patrick Wendell will move into the role of VP of engineering, and Ron Gabrisko has joined the company as SVP of worldwide sales.
Databricks sells and services an implementation of Apache Spark, and these executive moves reflect the 2-year-old company's efforts to get serious about the commercial market and enterprise customers. "As the creators and drivers of the Spark engine, Databricks is at an inflection point where the pace of innovation coming from the community positions us for tremendous growth and opportunity in 2016," Stoica said in a prepared statement. "Ali [Ghodsi] is positioned to enable both Databricks and Spark to seek widespread enterprise adoption, momentum, and customer acquisition."
Data platform analytics company Looker this week announced it has closed a $48 million Series C funding round led by Kleiner Perkins Caufield & Byers, with participation from previous investors, too. The company said it will use the new capital to accelerate its growth through investments in sales, marketing, engineering, and international expansion.
Lastly, digital crowd-sourced encyclopedia Wikipedia is marking its 15th anniversary. To help celebrate the occasion, the folks over at FiveThirtyEight.com have collected the three most edited entries for each year since Wikipedia launched in 2001, which you can see in this article. Spoilers: Many of the highly edited entries are related to big news events for each year, particularly if those events were in any way controversial. For instance, in 2008 the entry most edited was for then US vice presidential candidate Sarah Palin. Wikipedians are also obsessed with tracking deaths, major weather events and systems, popular culture, politics, and "the esoteric and arcane."