Microsoft Open Sources Deep Learning, AI Toolkit On GitHub - InformationWeek
Data Management // Big Data Analytics
09:06 AM
Connect Directly

Microsoft Open Sources Deep Learning, AI Toolkit On GitHub

Previously available to academic researchers, Microsoft's Computational Network Toolkit (CNTK) now has a friendlier open source license.

8 Ways To Monetize Data
8 Ways To Monetize Data
(Click image for larger view and slideshow.)

On Monday Microsoft joined its peers, including Google, Facebook, and Yahoo, in offering a deep learning framework to support artificial intelligence applications.

The company released its Computational Network Toolkit (CNTK) as an open source project on GitHub, thus providing computer scientists and developers with another option for building the deep learning networks that power capabilities like speech and image recognition.

CNTK has been available to academic researchers since last April under a more restrictive license.

There are already several dozen deep learning toolkits and modules available. But the pace at which this technology is appearing has quickened. According to artist and developer Kyle McDonald, the average interval between deep learning framework releases was 47 days in the 2010-2014 period. Last year, he claimed in a tweet, that interval shrank to 22 days.

That may be because AI has become a major focus at leading technology companies. In early 2015, Facebook open sourced modules for the Torch deep learning toolkit. Then in November, Google released TensorFlow. In January this year, Baidu released Warp-CTC. Even Yahoo joined in, releasing a dataset derived from the Yahoo News Feed to fuel machine learning systems.

(Image: Microsoft)

(Image: Microsoft)

Microsoft attributes the surge in interest to the growing number of researchers running machine learning algorithms supported by deep neural networks -- systems modelled on the processes in human brain. Microsoft says that many researchers believe such systems can enhance artificial intelligence applications.

The rapid improvements over the past few years in the speech recognition capabilities of applications like Apple's Siri and Google Translate, and in the image recognition capabilities of Google Photos, suggest that belief is well-founded. As mobile and Internet-connected devices proliferate, AI can be expected to become even more important as a way to facilitate function without traditional keyboard-based interaction.

But corporate interest in releasing such toolkits isn't entirely altruistic. By making software used internally available as open source code, these companies benefit from contributions that improve their code. By encouraging external research talent to become familiar with internal toolsets, they make the path by which these people could become employees a bit easier to traverse.

Xuedong Huang, Microsoft's chief speech scientist, extolled the speed of CNTK in a blog post. "The CNTK toolkit is just insanely more efficient than anything we have ever seen," he said.

CNTK can take advantage of the number-crunching power GPUs on single computers (Windows or Linux) or computing clusters.

TensorFlow can utilize distributed GPUs too, but only on Linux machines. TensorFlow runs on OS X without CUDA parallel GPU support (perhaps not for long). It also can be run on Windows through Docker, which likewise limits GPU usage. Windows support through Bazel appears to be planned.

[See AI, Machine Learning Rising In The Enterprise.]

One disadvantage of CNTK is that it requires C++. TensorFlow supports Python as well as C++. However, Microsoft is planning to add support for Python and C#. It's also developing an Azure cloud service, referred to as Project Philly, that will provide the ability to run CNTK, among other applications, across multiple virtual GPUs.

In a Facebook post expressing support for an assessment of deep learning frameworks conducted by Microsoft researcher Kenneth Tran, Yann LeCun, director of AI at Facebook, contends that Torch has the fewest deficiencies among deep learning frameworks. "Torch has an almost perfect rating on all counts," he notes. "Theano and TensorFlow lack speed, Tensorflow and Caffe lack flexibility."

Ultimately, however, these toolkits depend upon data, and neither of the companies providing deep learning tools are offering third-parties access to the massive datasets they use to train their models. To use that data, start with a job application.

What have you done to advance the cause of Women in IT? Submit your entry now for InformationWeek's Women In IT Award. Full details and a submission form can be found here.

Thomas Claburn has been writing about business and technology since 1996, for publications such as New Architect, PC Computing, InformationWeek, Salon, Wired, and Ziff Davis Smart Business. Before that, he worked in film and television, having earned a not particularly useful ... View Full Bio

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Ninja
1/28/2016 | 10:33:03 PM
Toolset is still valuable
Even without data, the toolkits are still of use. Not just these Tech companies, utility companies, govt, media, hospitals, Telecoms have a lot of data on which the toolkit can be applied on.
How Enterprises Are Attacking the IT Security Enterprise
How Enterprises Are Attacking the IT Security Enterprise
To learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of the Cloud Report
As the use of public cloud becomes a given, IT leaders must navigate the transition and advocate for management tools or architectures that allow them to realize the benefits they seek. Download this report to explore the issues and how to best leverage the cloud moving forward.
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Join us for a roundup of the top stories on for the week of November 6, 2016. We'll be talking with the editors and correspondents who brought you the top stories of the week to get the "story behind the story."
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll