Eavesdropping On A New Level

MIT, Microsoft, and Adobe research team demonstrate how to capture sound using video images of objects. Yes, plants will parrot what you say with more fidelity than parrots, under the right conditions.
Betrayed by a plant?
Tiny telltale signs
Fast video helps
Some visual microphones are better than others
I see what you hear
Consumer cameras work, too Video artifacts caused by rolling shutters of CMOS sensors in consumer video cameras allowed the r

If a tree falls in a forest and no one is around to hear it, does it make a sound? This trite question masquerades as a conundrum, when it's really just vexing because of its vague construction and internal contradiction.

Nonetheless, this question can now be answered if there's high-resolution video footage of the toppling tree, even without an audio track.

Researchers from MIT, Microsoft, and Adobe have shown that they can recover sound from video imagery, a technique that promises to pique the interest of intelligence agencies and forensic investigators. While the technique will need to be refined to be practical outside the laboratory, it has the potential to enable retroactive eavesdropping at events that were videoed with sufficient fidelity.

Sound, of course, is how we describe vibrations we receive and perceive, typically through the air. When a sound strikes our eardrums, our eardrums move and we hear the sound (assuming the absence of impairment). And when sound waves strike an object, like a bag of potato chips, it too moves, imperceptibly.

But these vibrations can be perceived with a video camera, as Abe Davis (MIT), Michael Rubenstein (MIT/Microsoft), Neal Wadhwa (MIT), Gautham Mysore (Adobe), Frédo Durand (MIT), and William T. Freeman (MIT) have demonstrated and documented.

In a paper to be presented in mid-August at SIGGRAPH 2014, the researchers describe how they filmed a series of objects using both a high-speed video camera and a consumer video camera and were able to reproduce sounds that had been playing near objects using only video information -- the object's minute vibrations in response to the impact of sound waves.

The technique "allows us to turn everyday objects -- a glass of water, a potted plant, a box of tissues, or a bag of chips -- into visual microphones," the paper explains. "Remarkably, it is possible to recover comprehensible speech and music in a room from just a video of a bag of chips."

The science is similar to that employed by laser microphones, which use light to measure sound vibrations. But laser microphones are an active form of surveillance. Analyzing objects for vibrations can be done after the fact, given video of sufficient quality and source audio of sufficient volume.

The experiment focused on high-speed video -- up to 6,000 frames per second -- but the researchers also had success retrieving audio from video captured on consumer-grade video cameras shooting at 60 frames per second.

US intelligence presumably already has more sophisticated eavesdropping technology. A decade-old patent application arising from work at NASA, "Technique and device for through-the-wall audio surveillance," describes a way to listen in on even soundproofed locations by using "reflected electromagnetic signals to detect audible sound." But MIT's Visual Microphone technique could become a useful addition to an already formidable set of surveillance tools.

Take a look at how it works.

Next slide