Analyze 46 Petabytes Of Data Via Laptop? Here's How
AT&T "nanocubes," a poor man's approach to big-data analysis, let users analyze petabytes of data per day in laptop memory.
5 Big Wishes For Big Data Deployments
(click image for larger view and for slideshow)
AT&T Research just put together a kind of poor man's big data approach to complex visualizations. It's developed a way to use a mere laptop to run analytics on the 46 petabytes of data running across AT&T's networks every day. AT&T calls the process nanocubes, a pun on data cubes, used to aggregate large quantities of data in relational databases. The AT&T researchers developed nanocubes as a way to see just how much data they could get into RAM.
Keeping large amounts of data in RAM would make it easier for people to analyze it. "For most visualizations it is crucial when the person viewing the visualization is in the loop," said Chris Volinsky, executive director of the statistics research department at AT&T. "We really wanted to have a fluid, quick response, so users could come up with a question and look at the data quickly to see if their initial hypothesis held any water."
Using Hadoop clusters and scaling across machines causes latency problems that affect visualizations, he said.
Volinsky is no stranger to hard problems. He was part of the team that won the million-dollar Netflix Prize.
Stuffing billions of data points into computer memory was a big problem of its own. Jim Klosowksi, a principal member of technical staff at AT&T, said data cubes are naturally quick ways to look at data, but get very large. He said AT&T's research team had to come up with "several clever tricks" to minimize the amount of memory needed to store data. Because the data involved both time and geography, one trick was to store answers to common queries once -- time of day, for instance -- and simply use them again, rather than recalculate them. For more on their tricks, see this paper company researchers presented, "Nanocubes for Real-Time Exploration of Spatiotemporal Datasets." There's also a Youtube video.
AT&T is using nanocubes to examine things such as why cellphone calls get dropped on its network. Volinsky noted that AT&T has an app that lets customers report where they are when calls get dropped. The company can look at that data and combine it with internal data on signal strength and other network measures, and then visualize problem areas. It can use those maps to decide where to invest in its network technology.
AT&T could do similar visualizations before, but doing things like zooming in and out of the visualization to get different perspectives on the data was frustratingly slow. The nanocube approach means "you can start from the country and zoom to street level without much pain and suffering," Volinsky said. "It's had a huge impact."
AT&T has made the code for its nanocube visualization tools open-source, so it can be used by other businesses.
What nanocubes might do for other kinds of companies will vary, Volinsky said. "If you are a CIO of a company that has data that falls into a geographic paradigm, or transactions that fall into a geographic paradigm, this is a way of exploring the data in an interactive framework that would be very difficult otherwise."
Visuals of smartphone data using nanocube technology.
Companies that have geotagged their point-of-sale data, for instance, could build a nanocube that lets them study data over time and look for anomalies in their patterns. They could also potentially stream such data in real-time. Perhaps the main advantage will be that CIOs could set up nanocubes without needing to add to their server infrastructure.
"We aren't aware of other systems out there that allow this kind of interaction and exploration without firing up a big IT project," Volinsky said.
Making decisions based on flashy macro trends while ignoring "little data" fundamentals is a recipe for failure. Also in the new, all-digital Blinded By Big Data issue of InformationWeek: How Coke Bottling's CIO manages mobile strategy. (Free registration required.)