Advances in assisted Q&A could find their way into medicine, legal, tech support and compliance applications.
From a computing perspective, this grand challenge is very different than Deep Blue's chess matches. Watson must be able to decipher English-language clues it has never seen before. It must quickly search for possible answers in a fixed knowledge store and then apply myriad analytics to determine its certainty in an answer -- posed, of course, in the form of a question.
Language is often ambiguous, highly contextual and open to infinite meanings, idioms, slang and regional dialects. The domains of knowledge are completely unpredictable, with topics and questions known only to the producers of Jeopardy. The clues might be about history, literature, geography, politics, arts, science, or pop culture.
Watson can't just look things up on the Internet. That would be unfair to the humans. Watson is a stand-alone computer. Its memory is a fixed, 15-terabyte store containing the equivalent of 200 million pages or one million books on diverse topics (okay, maybe that's not so fair).
Rather than using a database, Watson relies on a content store based on the Unstructured Information Management Architecture (UIMA), something IBM developed and has since placed in the open source community. Watson "learned" all the information it stores in advance, meaning the content passed through an analysis/training stage in which it was marked up with metatags denoting entities such as people, places, things, dates and concepts.
(click image for larger view)
Slideshow: Inside Watson, IBM's Jeopardy Computer
Search technology retrieves contextually appropriate information quickly, but the real secret sauce behind Watson is its battery of analytic scoring algorithms. That's what IBM Research spent years refining, and it's the key to a computer's ability to decipher the language used in the clue and score its confidence in having the right answer.
"The hard part for the computer is finding and justifying the correct answer," said Dr. David Ferrucci, the "Principal Investigator" behind the Watson project. "For each of thousand of plausible answers, Watson gathers evidence and uses thousands of algorithms to understand what's most likely to be the right answer."
Developing the knowledge set and analytic functionality was the major hurdle, but when that technology runs on a single processor, it takes as long as two hours to answer a single question with confidence. The next step was putting it all on steroids with a massively parallel processing (MPP) architecture.
Watson is essentially a workload-optimized system purpose built to play Jeopardy. It runs on 10 racks of IBM Power 750 Servers -- standard commercial hardware that's readily available. The total machine operates at 80 teraflops, or about 80 trillion operations per second. With that much processing power under the hood, Watson's performance went from two hours on a single node down to less than three seconds -- the threshold required to match wits with Jeopardy champions.
How Enterprises Are Attacking the IT Security EnterpriseTo learn more about what organizations are doing to tackle attacks and threats we surveyed a group of 300 IT and infosec professionals to find out what their biggest IT security challenges are and what they're doing to defend against today's threats. Download the report to see what they're saying.
IT Strategies to Conquer the CloudChances are your organization is adopting cloud computing in one way or another -- or in multiple ways. Understanding the skills you need and how cloud affects IT operations and networking will help you adapt.