IoT
IoT
Data Management // Big Data Analytics
News
1/27/2016
09:06 AM
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%

Microsoft Open Sources Deep Learning, AI Toolkit On GitHub

Previously available to academic researchers, Microsoft's Computational Network Toolkit (CNTK) now has a friendlier open source license.

8 Ways To Monetize Data
8 Ways To Monetize Data
(Click image for larger view and slideshow.)

On Monday Microsoft joined its peers, including Google, Facebook, and Yahoo, in offering a deep learning framework to support artificial intelligence applications.

The company released its Computational Network Toolkit (CNTK) as an open source project on GitHub, thus providing computer scientists and developers with another option for building the deep learning networks that power capabilities like speech and image recognition.

CNTK has been available to academic researchers since last April under a more restrictive license.

There are already several dozen deep learning toolkits and modules available. But the pace at which this technology is appearing has quickened. According to artist and developer Kyle McDonald, the average interval between deep learning framework releases was 47 days in the 2010-2014 period. Last year, he claimed in a tweet, that interval shrank to 22 days.

That may be because AI has become a major focus at leading technology companies. In early 2015, Facebook open sourced modules for the Torch deep learning toolkit. Then in November, Google released TensorFlow. In January this year, Baidu released Warp-CTC. Even Yahoo joined in, releasing a dataset derived from the Yahoo News Feed to fuel machine learning systems.

(Image: Microsoft)

(Image: Microsoft)

Microsoft attributes the surge in interest to the growing number of researchers running machine learning algorithms supported by deep neural networks -- systems modelled on the processes in human brain. Microsoft says that many researchers believe such systems can enhance artificial intelligence applications.

The rapid improvements over the past few years in the speech recognition capabilities of applications like Apple's Siri and Google Translate, and in the image recognition capabilities of Google Photos, suggest that belief is well-founded. As mobile and Internet-connected devices proliferate, AI can be expected to become even more important as a way to facilitate function without traditional keyboard-based interaction.

But corporate interest in releasing such toolkits isn't entirely altruistic. By making software used internally available as open source code, these companies benefit from contributions that improve their code. By encouraging external research talent to become familiar with internal toolsets, they make the path by which these people could become employees a bit easier to traverse.

Xuedong Huang, Microsoft's chief speech scientist, extolled the speed of CNTK in a blog post. "The CNTK toolkit is just insanely more efficient than anything we have ever seen," he said.

CNTK can take advantage of the number-crunching power GPUs on single computers (Windows or Linux) or computing clusters.

TensorFlow can utilize distributed GPUs too, but only on Linux machines. TensorFlow runs on OS X without CUDA parallel GPU support (perhaps not for long). It also can be run on Windows through Docker, which likewise limits GPU usage. Windows support through Bazel appears to be planned.

[See AI, Machine Learning Rising In The Enterprise.]

One disadvantage of CNTK is that it requires C++. TensorFlow supports Python as well as C++. However, Microsoft is planning to add support for Python and C#. It's also developing an Azure cloud service, referred to as Project Philly, that will provide the ability to run CNTK, among other applications, across multiple virtual GPUs.

In a Facebook post expressing support for an assessment of deep learning frameworks conducted by Microsoft researcher Kenneth Tran, Yann LeCun, director of AI at Facebook, contends that Torch has the fewest deficiencies among deep learning frameworks. "Torch has an almost perfect rating on all counts," he notes. "Theano and TensorFlow lack speed, Tensorflow and Caffe lack flexibility."

Ultimately, however, these toolkits depend upon data, and neither of the companies providing deep learning tools are offering third-parties access to the massive datasets they use to train their models. To use that data, start with a job application.

What have you done to advance the cause of Women in IT? Submit your entry now for InformationWeek's Women In IT Award. Full details and a submission form can be found here.

Thomas Claburn has been writing about business and technology since 1996, for publications such as New Architect, PC Computing, InformationWeek, Salon, Wired, and Ziff Davis Smart Business. Before that, he worked in film and television, having earned a not particularly useful ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
nasimson
100%
0%
nasimson,
User Rank: Ninja
1/28/2016 | 10:33:03 PM
Toolset is still valuable
Even without data, the toolkits are still of use. Not just these Tech companies, utility companies, govt, media, hospitals, Telecoms have a lot of data on which the toolkit can be applied on.
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
Top IT Trends to Watch in Financial Services
IT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on InformationWeek.com for the week of September 25, 2016. We'll be talking with the InformationWeek.com editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.