Finding smaller, more relevant morsels of information is the key to unlocking big data's potential, says FirstFuel's CTO.
Location Analytics + Maps: 10 Eureka Moments
(Click image for larger view and slideshow.)
You've heard of "big data," of course, but what about "deep data?" Fear not, our goal here is not to foist another industry buzzword upon you. But given the ongoing debate over how much data a company should ingest and manage, the concept of deep data is one that every data-focused organization should know.
Badri Raghavan is CTO and chief data scientist of FirstFuel, an analytics company that focuses on the energy efficiency in buildings. Its customers, including governments and energy utilities, use FirstFuel's energy analytics services to run greener, more cost-efficient offices, schools, and other structures.
In a phone interview with InformationWeek, Raghavan gave his take on the term "deep data," and how FirstFuel uses it to its advantage.
"What we call 'deep data' is a combination of experts' domain knowledge of the area -- which in our case happens to be energy combined with data science -- to help analyze the energy usage for buildings on a very massive scale," he told us.
The concept of deep data has much to do with information density. "A given data stream can be very informative," said Raghavan. "Conversely, you can collect a lot of data that's not particularly insightful or informative."
As you might have guessed, Raghavan is not a fan of data hoarding, or taking in as much information as you can, even when you're not sure what, if any, value it will provide further down the road.
Data collection is really about efficiency, or "leveraging the data asset you already have. The way to do that is to [determine] the technical or business problem you're trying to solve. What is the single most important data stream you can leverage?"
In FirstFuel's line of work -- analyzing the energy consumption of large buildings -- that single stream turns out to be meter data.
"We look at meter data as a scan of a building. Using our data science algorithms, we analyze the health of a building and pinpoint where it's sick, and where it can be more efficient."
And that, he noted, is one example of deep data at work. Meter data is "a relatively skinny data stream with so much content," which allows FirstFuel to pinpoint the problem it's interested in: identifying inefficiencies in energy consumption.
The trick for many organizations, of course, is knowing which data streams have the most value, and then figuring out how to combine them with other data to gain new insights.
FirstFuel has several data streams it finds particularly valuable.
"Meter data tells us a lot about a building," said Raghavan. "Then we start using high-resolution aerial imagery -- you know, Google Earth, we use that a lot. In our domain, it's very informative. It tells us what type of equipment is sitting on top of these buildings," which tells FirstFuel a lot about the amount of energy a building should consume.
The analytics firm adds in weather data from the National Weather Service, too.
"We step it up, little by little by little. We pick up every new data stream if it adds to the information insight."
And that, he said, is the notion of deep data. "You can do some deep learning on relatively sparse data, as opposed to analyzing large volumes of data… and looking for the needle in the haystack."
FirstFuel, for instance, could collect loads of additional data -- including information on traffic patterns and parking lots, as well as Twitter streams -- but hasn't found a compelling reason to do so.
"Instead of going down the big data path, where there's a lot of data you could potentially analyze, but for relatively little incremental gain, we instead [focus on] the bare minimum that tells us the most about a building," says Raghavan. "And then we build on that, step by step."
Our new survey on enterprise data centers shows growing demand, flat budgets, and CIOs looking to cloud providers -- not to offload services, but to steal ideas. Get the new State Of The Enterprise Data Center issue of InformationWeek Tech Digest today (free registration required).
Jeff Bertolucci is a technology journalist in Los Angeles who writes mostly for Kiplinger's Personal Finance, The Saturday Evening Post, and InformationWeek. View Full Bio
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.