Doing Computer Vision Without Cameras

The way you walk, the way radio waves reflect off your body, and your body's thermal signature all play into the ability for AI to identify you without the use of a camera.

James Kobielus, Tech Analyst, Consultant and Author

June 25, 2018

5 Min Read

Image: Jason Dorfman/MIT CSAIL

Computer vision is a privacy advocate’s nightmare. We see this concern most starkly in the escalating war between advocates of computer vision’s killer app -- artificial intelligence-driven face recognition -- and the many developers of innovative countermeasures that use AI to evade this level of intrusive surveillance.

Normal cameras can’t see through walls, which means that mainstream computer vision is powerless in any space that you can keep camera-free. But just as the blind use their other senses to compensate for visual impairment, AI-powered computer vision platforms are increasingly able to make accurate visual inferences even when they lack visual inputs.

In the larger sense, computer vision is becoming the sum of all sensor inputs that can be rendered as visual patterns. Through sophisticated AI, it’s increasingly possible to infer a highly accurate visual portrait from the radio frequency signals that your body reflects, the pressure and vibrations you generate by walking around, and the heat patterns that you radiate by simply being alive. Any and all of that is a unique signature that can be used to “see” you even when you’ve successfully hidden your face, voice, fingerprints, and genome from prying eyes.

Before long, it may not be necessary to position cameras everywhere in order to cobble together a good-enough visual representation of indoor and outdoor scenes. Here’s a quick overview of recent innovations comings from the research community, most of which depend on sophisticated AI:

WiFi-sensed pose recognition: Even if you’re alone in a room with the door closed and windows covered, your identity may be revealed by how your body reflects WiFi signals back through those barriers. As discussed in this recent article, researchers at MIT have developed a wall-penetrating scanner monitor that combines Wi-Fi, sensors, and AI algorithms to model what people are doing on the opposite side of an otherwise opaque barrier. The technology, called RF-Pose, is equivalent to echolocation in its ability to trace the 2-D “stick figure” outlines of unseen people and other objects, based on the patterns associated with how they reflect Wi-Fi signals. When correlated and cross-trained with AI applications that perform gait, gesture, and movement recognition, this approach can identify these stick figures with specific individuals 83% of the time.

Pressure-sensed gait recognition: As discussed in this recent article, researchers at the University of Manchester have built an AI-powered footstep-recognition system that can sense unique patterns in individuals’ gaits with near-perfect accuracy. The technique, called SfootBD, is a passive sensor that analyzes two different factors in each individual’s unique walking pattern, including weight distribution, gait speed, and three-dimensional measures of each walking style. They also correlated the footstep pressure signals with visual evidence of walking styles that was gained with a high-resolution camera. To train the AI, researchers accumulated a database of footstep signals from more than 120 individuals, measuring each person’s gait using pressure pads on the floor. The data was gathered by monitoring participants in public scenarios (airport security checkpoints and workplaces) and in their homes. They also tested the algorithm on a control group of imposters, giving it the ability to detect when someone was trying to fake someone else’s gait.

Thermally sensed activity recognition: Thermal sensing technology is nothing new, and in fact is commonly found in building automation, energy management, security, and access control systems. Thermal sensors detect the heat emitted by people or objects in the infrared spectrum. It’s one of many inputs supported by the new generation of super-sensors, which can also detect sound, vibration, light, and electromagnetic activity. Google is one of many companies making deep investments in the AI necessary to process all this data holistically to sense human and other activities in indoor and outdoor scenarios with pinpoint precision.

Generatively sensed perspective recovery: As discussed in this article, researchers are building AI models, known as “generative query networks that can look at a visual scene from several angles and can then describe what it would look like from another perspective. DeepMind, an AI-focused subsidiary of Alphabet, has built AI that can autonomously build a data-driven visual picture of the world, even reasoning with high accuracy about what may be present where a scene is partly occluded. DeepMind’s researchers tested the approach on three virtual settings: a block-like tabletop, a virtual robot arm, and a simple maze. It uses a generative adversarial network (GAN), in which a generative AI model builds a scene and a discriminator AI model attempts to assess its degree of plausibility. The GAN efficiently builds up layers of detail in a scene -- including object shapes, positions, and colors -- using a vector representation.

Clearly, these tools can be exploited by law enforcement, espionage, and military agencies everywhere. But that’s not necessarily a bad thing. Cameras are an intrusive tool, or are infeasible to introduce into many environments where, nevertheless, there is a legitimate societal need for visibility:

They can help police officers detect when individuals in adjoining rooms are brandishing weapons, as well as where they are physically in those spaces, thereby eliminating the element of surprise and minimizing the possibility of ambush.
They can make it cost-effective to continually monitor and secure every room in every residence, office, or other building without having to incur the expense of installing cameras throughout those facilities.
They could potentially also help caregivers to continually monitor the ambulatory status of the elderly, disabled, and people with various medical conditions, without the perceived privacy encroachment associated with continual camera surveillance.

Be that as it may, Wikibon expects that computer vision companies will add these approaches to their solution portfolios to enable AI-accelerated visibility in a wider range of operating scenarios. Check out this recent article for a discussion of some sophisticated tools that use AI to augment the visibility of camera-based computer-vision technologies.

About the Author(s)

James Kobielus

Tech Analyst, Consultant and Author

James Kobielus is an independent tech industry analyst, consultant, and author. He lives in Alexandria, Virginia.

See more from James Kobielus

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

Doing Computer Vision Without Cameras

About the Author(s)

Editor's Choice