For 20 years, computer scientists have been working on improving ways to search among reams of video clips for a particular shot. New research by IBM, Microsoft, and academic teams studying the problem could bring them a step closer to that goal.
At a conference in Cambridge, England, last week, an IBM researcher gave the first public demonstration of a computer system called Marvel that uses statistical techniques to learn about relationships between colors, shapes, patterns, sounds, and other clues from video footage that can help identify its content. IBM's prototype then labels the footage so users can go back and find individual shots. That could be a boon not only to TV news producers but intelligence analysts watching surveillance video and even PC users editing home movies.
Today's state of the art relies on searching for keywords embedded in video files, says IBM Research senior manager John Smith, who heads the project. Few TV stations, for example, extensively label all the shots in their footage. When they do, labels typically contain keywords to describe an entire program--not individual shots.
Smith's intelligent information management group has written algorithms that can identify 140 concepts gleaned from several years' worth of ABC News and CNN broadcasts obtained from the University of Pennsylvania. They include airplanes, animals, and weather news. Marvel was able to analyze 70 scenes of rocket launches and place 50 of them among the top 100 results of a search, Smith says. New work the group plans to describe in a November paper aims to increase accuracy by combining concepts--for example, an object that looks and sounds like an airplane, set against an outdoor scene, is more likely to be a plane.
Smith's team also is working with Columbia University's digital video multimedia lab on a project to search news footage from U.S. and foreign broadcasters for related topics, combining computer vision and image understanding with machine learning approaches that analyze each station's signature approach to a story.
Searching video is hard because it doesn't have the structure of text or graphics documents to define it, says Shih-Fu Chang, a professor of electrical engineering at Columbia University and director of the lab. Video footage lacks concepts such as sentences and sections in a text file or graphical images' clean corners, circles, and boundaries. "There's no existing alphabet or vocabulary that can describe what's going on," Chang says. "Our goal is to make all this unstructured data searchable and organizable."
Microsoft Research has a system that lets users see all the relevant shots in a home movie by positioning their mouse pointers over what they want to see, such as someone's face. "You have a view of everything you've seen in the video," Microsoft researcher Nebojsa Jojic says. Microsoft's Beijing lab is also researching video search techniques. And Carnegie Mellon University's Informedia project weighs factors such as shapes, colors, and text to answer different types of queries--for an object or person, for example. As more storage and faster CPUs fuel demand for PC video apps, researchers say work on search techniques is increasing.