IBM Cognitive Colloquium Spotlights Uncovering Dark Data

This IBM cognitive computing event champions advances in image recognition, captioning, and deep learning to reveal insights from mushrooming IoT and machine learning data.

Charles Babcock, Editor at Large, Cloud

October 14, 2015

8 Min Read

<p align="left">(Image: DKart/iStockphoto)</p>

Apple, Microsoft, IBM: 7 Big Analytics Buys You Need to Know

Apple, Microsoft, IBM: 7 Big Analytics Buys You Need to Know (Click image for larger view and slideshow.)

Data is being generated at a frenetic pace, with the Internet of Things about to multiply the Internet's current production several times over. Despite our passion to collect it, much of this data will remain unknown, un-sampled, and useless. In other words, it should be described as "dark" data.

That was the note on which John Kelly, IBM's "father of Watson" and the senior VP who pushed for Watson's contest with a human on the Jeopardy quiz show, began IBM Research's third Cognitive Colloquium, a San Francisco event that explores the changeover from linear, von Neumann computing to a compute architecture that better mimics the working of the human brain.

Today, "80% of all data is dark and unstructured. We can't read it or use in our computing systems. By 2020, that number will be 93%," he told over 300 neuroscience researchers, computer scientists, hopeful startup executives, and academics interested in cognitive computing.

Figure 1: (Image: DKart/iStockphoto)

(Image: DKart/iStockphoto)

At the same time, we are generating a million GB of health data for each person during a lifetime, and there are 7 billion people on Earth. By 2020, cars will be generating 350MB of data per second "and [that data] will need to be assessed," he noted.

[Want to learn more about machine learning on the IoT? See GE CEO: IoT Boosts Safety, Efficiency, Speed.]

Cognitive computing seeks to understand natural language, analyze pictures the way the human eye can, and do other complex tasks. But Kelly said, "This is not a journey to reproduce what the human mind can do," discounting worries that intelligent machines will first match, then surpass what human brains can do. The idea is rather to get computers, doing what they do well, to augment existing human intelligence.

"It's to understand what's out there, to get insights into all that data," he added. The price on not moving ahead with cognitive computing is high. More sophisticated compute power is needed to avoid such complex problems as building a $1 billion oil-drilling platform in the wrong place or having doctors and their healthcare imaging systems come up with the wrong diagnosis.

There's a trillion dollars in waste in the healthcare system, as hit-or-miss treatments and therapies are pursued without anyone knowing for sure whether they will work. In some cases they actually do not. "Think of a radiologist who sits in a room and looks at thousands of image each day. Eventually fatigue sets in." A cognitive system like IBM's Watson can analyze thousands of images tirelessly. Watson is being trained to examine X-ray images from a library of 30 million, well-defined medical images. When the training is finished, Watson may be used by some organizations to supplement or inform human radiologists.

When people object that it's expensive to capture and process all that data, Kelly said he asks, "What is the price of not knowing? What is the price of not being able to cure cancer?"

Cognitive computing will usher in a new era, as different from its predecessor as today's computerized banking is from its paper-ledger predecessor. In the last two years, neural cognitive computing networks have transformed natural language processing and language translation from iffy processes into ones in which not only words, but also meanings are captured with much greater fidelity.

"There's not an industry or discipline that's not going to be transformed by this technology over the next decade," Kelly said.

Research into cognitive computing is proceeding along several promising lines. Terrence Sejnowski, a pioneer of computational neuroscience at the Salk Institute at the University of California San Diego, pointed out the difference between today's cognitive computing skills and linear logic by recounting an anecdote. He was once on his way to a university's computer science event when his escort warned him the faculty members he was about to meet didn't like the idea of cognitive computing and considered it a threat to their work.

As he began speaking to the group, he showed a picture of a honey bee and laid down this challenge: With only a million neurons, the honey bee "can see, fly, and mate. Your supercomputer, with billions of neurons, despite the best efforts of the NSA, can't do that. Why not?" The question stumped his would-be questioners.

The answer, of course, was that the bee's brain had been wired by evolution to accomplish those tasks through its cognitive computing; the supercomputer had not.

Today's cognitive computing systems can look at an image of a person in an outdoor setting doing something and write an accurate caption that says, "Woman throwing a Frisbee in a park." That's a result of analyzing the image in a fashion that better resembles how the human brain works. In the past, image analysis has proceeded frame by frame, with the computer discarding the image of the previous frame to analyze the next. Cognitive computing retains the data for further examination. That situation leads to an area of focus and the ability to compare and draw a conclusion that better matches the brain's perceptions, which occur as a series of spikes in its electrical activity.

Sejnowski said the brain isn't analogous to an electronic computer. Rather, it might at best be compared to hundreds of computers wired in different ways to do different things, but able to coordinate their results. Cognitive computing will rely on neural networks, which can do pattern recognition and associative thinking; recurrent networks, which hold their data for repeated consultation by processors; and "deep, multilayered networks" that might aid in processing an image in many ways simultaneously.

Cognitive computing is also working on the process of computerized "deep learning." Over the next several decades, he said, neuroscience, cognitive computing, and nano-science together will come closer to how the brain captures data, accesses it, and processes it. That advancement will transform computing as we know it.

"Today's big data is important. The future is going to be even bigger," he said.

But Yoshua Bengio, a professor of computer science at the University of Montreal and an expert in machine learning, said one of cognitive computing's latest efforts, the two-year-old Adam Net, has cognitive powers that ranked

Page 2: Deep learning

somewhere between Sejnowski's honey bee and a frog. In other words, getting cognitive computing to resemble the processing power of the human brain is still a way off.

Bengio said machine learning based on "deep learning" techniques were another area of rapid cognitive computing advance. Deep learning collects and analyzes machines' data in different ways over a period of time until it recognizes what the data means about the state of the machine. Machine data based on images could be analyzed for areas of pixel light intensity, color, edges, and other factors for the learnings they would offer about the machine's state. When machine data is looked at in multiple ways, more can be learned about the meaning of the data and how it reflects a machine's operation and its environment.

During a panel discussion at the event, Fei-Fei Li, director of the AI and Vision Lab at Stanford University, said machine learning is making strides but is still in its infancy. "In my opinion, the quest toward artificial intelligence goes from perception to cognition and reasoning. We're doing very well with perception" of objects and images and "just beginning to see (computerized) captioning" -- identification of what's in the image. But much work remains to be done, she said, in the areas of cognition, understanding what the data means, and reasoning with the conclusions gained from the data.

That will take different sets of algorithms working on machine data with some higher intelligence able to tie together many different outcomes. Later in the discussion, Li added: "If you think about the evolution of the brain, you realize nature doesn't just patch up parts."

[Find out how software-based machine learning attempts to emulate the same process that the brain uses.]

Among the attendees at the Cognitive Colloquium was Will Barkis, technology analyst for the telecom company Orange's Orange Fab in Silicon Valley. He said that cognitive computing might one day assist Orange in addressing business customers' needs on business practices and telecommunication use, but that the science is still in its research stage.

Ron Mak, a computer science professor at San Jose State (who spent two hours on Route 101 getting to a conference only 55 miles away), asked if the image analysis that's being applied to solitary images could also be applied to video. By the end of the morning, he had his answer: It could.

Jim Shaw, cofounder and CTO of small San Francisco software development firm BergenShaw International, was an early implementer of machine learning software, producing with a partner a system that spotted friction in a manufacturer's process of ramping up disk-drive production. Shaw was later able to sell a variation of the system to companies doing human gene sequencing. He attended, he said, to keep an eye out for the next breakthrough in machine learning that a small firm might pick up quickly and apply in a new product.

About the Author(s)

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

See more from Charles Babcock

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

About the Author(s)

Editor's Choice