Advances in assisted Q&A could find their way into medicine, legal, tech support and compliance applications.
From a computing perspective, this grand challenge is very different than Deep Blue's chess matches. Watson must be able to decipher English-language clues it has never seen before. It must quickly search for possible answers in a fixed knowledge store and then apply myriad analytics to determine its certainty in an answer -- posed, of course, in the form of a question.
Language is often ambiguous, highly contextual and open to infinite meanings, idioms, slang and regional dialects. The domains of knowledge are completely unpredictable, with topics and questions known only to the producers of Jeopardy. The clues might be about history, literature, geography, politics, arts, science, or pop culture.
Watson can't just look things up on the Internet. That would be unfair to the humans. Watson is a stand-alone computer. Its memory is a fixed, 15-terabyte store containing the equivalent of 200 million pages or one million books on diverse topics (okay, maybe that's not so fair).
Rather than using a database, Watson relies on a content store based on the Unstructured Information Management Architecture (UIMA), something IBM developed and has since placed in the open source community. Watson "learned" all the information it stores in advance, meaning the content passed through an analysis/training stage in which it was marked up with metatags denoting entities such as people, places, things, dates and concepts.
(click image for larger view)
Slideshow: Inside Watson, IBM's Jeopardy Computer
Search technology retrieves contextually appropriate information quickly, but the real secret sauce behind Watson is its battery of analytic scoring algorithms. That's what IBM Research spent years refining, and it's the key to a computer's ability to decipher the language used in the clue and score its confidence in having the right answer.
"The hard part for the computer is finding and justifying the correct answer," said Dr. David Ferrucci, the "Principal Investigator" behind the Watson project. "For each of thousand of plausible answers, Watson gathers evidence and uses thousands of algorithms to understand what's most likely to be the right answer."
Developing the knowledge set and analytic functionality was the major hurdle, but when that technology runs on a single processor, it takes as long as two hours to answer a single question with confidence. The next step was putting it all on steroids with a massively parallel processing (MPP) architecture.
Watson is essentially a workload-optimized system purpose built to play Jeopardy. It runs on 10 racks of IBM Power 750 Servers -- standard commercial hardware that's readily available. The total machine operates at 80 teraflops, or about 80 trillion operations per second. With that much processing power under the hood, Watson's performance went from two hours on a single node down to less than three seconds -- the threshold required to match wits with Jeopardy champions.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.