Big Data // Hardware/Architectures
News
11/15/2013
08:00 AM
Connect Directly
RSS
E-Mail
100%
0%
Repost This

Analyze 46 Petabytes Of Data Via Laptop? Here's How

AT&T "nanocubes," a poor man's approach to big-data analysis, let users analyze petabytes of data per day in laptop memory.

5 Big Wishes For Big Data Deployments
5 Big Wishes For Big Data Deployments
(click image for larger view and for slideshow)

AT&T Research just put together a kind of poor man's big data approach to complex visualizations. It's developed a way to use a mere laptop to run analytics on the 46 petabytes of data running across AT&T's networks every day. AT&T calls the process nanocubes, a pun on data cubes, used to aggregate large quantities of data in relational databases. The AT&T researchers developed nanocubes as a way to see just how much data they could get into RAM.

Keeping large amounts of data in RAM would make it easier for people to analyze it. "For most visualizations it is crucial when the person viewing the visualization is in the loop," said Chris Volinsky, executive director of the statistics research department at AT&T. "We really wanted to have a fluid, quick response, so users could come up with a question and look at the data quickly to see if their initial hypothesis held any water."

Using Hadoop clusters and scaling across machines causes latency problems that affect visualizations, he said.

[ Get helpful tips from these successful users of big data. Read Big Data Success: 3 Companies Share Secrets. ]

Volinsky is no stranger to hard problems. He was part of the team that won the million-dollar Netflix Prize.

Stuffing billions of data points into computer memory was a big problem of its own. Jim Klosowksi, a principal member of technical staff at AT&T, said data cubes are naturally quick ways to look at data, but get very large. He said AT&T's research team had to come up with "several clever tricks" to minimize the amount of memory needed to store data. Because the data involved both time and geography, one trick was to store answers to common queries once -- time of day, for instance -- and simply use them again, rather than recalculate them. For more on their tricks, see this paper company researchers presented, "Nanocubes for Real-Time Exploration of Spatiotemporal Datasets." There's also a Youtube video.

AT&T is using nanocubes to examine things such as why cellphone calls get dropped on its network. Volinsky noted that AT&T has an app that lets customers report where they are when calls get dropped. The company can look at that data and combine it with internal data on signal strength and other network measures, and then visualize problem areas. It can use those maps to decide where to invest in its network technology.

AT&T could do similar visualizations before, but doing things like zooming in and out of the visualization to get different perspectives on the data was frustratingly slow. The nanocube approach means "you can start from the country and zoom to street level without much pain and suffering," Volinsky said. "It's had a huge impact."

AT&T has made the code for its nanocube visualization tools open-source, so it can be used by other businesses.

What nanocubes might do for other kinds of companies will vary, Volinsky said. "If you are a CIO of a company that has data that falls into a geographic paradigm, or transactions that fall into a geographic paradigm, this is a way of exploring the data in an interactive framework that would be very difficult otherwise."

Visuals of smartphone data using nanocube technology.
Visuals of smartphone data using nanocube technology.

Companies that have geotagged their point-of-sale data, for instance, could build a nanocube that lets them study data over time and look for anomalies in their patterns. They could also potentially stream such data in real-time. Perhaps the main advantage will be that CIOs could set up nanocubes without needing to add to their server infrastructure.

"We aren't aware of other systems out there that allow this kind of interaction and exploration without firing up a big IT project," Volinsky said.

Making decisions based on flashy macro trends while ignoring "little data" fundamentals is a recipe for failure. Also in the new, all-digital Blinded By Big Data issue of InformationWeek: How Coke Bottling's CIO manages mobile strategy. (Free registration required.)

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Michael Fitzgerald
50%
50%
Michael Fitzgerald,
User Rank: Moderator
11/18/2013 | 10:09:44 AM
Re: Data Visualization At Big Data Scale
I have to caution that I have not worked with the technology. For many organizations it might replace server-driven visualizations. But certainly that will depend on the amount of data you're slicing. 
BethSchultz
50%
50%
BethSchultz,
User Rank: Apprentice
11/18/2013 | 10:02:12 AM
Re: Data Visualization At Big Data Scale
Michael, this is fascinating technology. Do you feel this replaces the need for a visual analytics infrastructure that allows for in-memory processing of visualizations at the server level or would serve as a complement to that?
Li Tan
100%
0%
Li Tan,
User Rank: Ninja
11/17/2013 | 10:05:45 PM
Re: Data Visualization At Big Data Scale
I would like to echo your point here. Nanocube is not just a relief to those CIOs who do not want to add extra hardware for big data analytics but it's an evangelization to those individuals who would like to use their laptop to peep into the big data in hand. Without bothering to set up the complete big data infrastruture, you can make your laptop self-contained for big data importing and real-time analysis, which is especially good fo personal use and small business.
Laurianne
50%
50%
Laurianne,
User Rank: Author
11/17/2013 | 12:42:57 PM
Re: Data Visualization At Big Data Scale
Nanocubes may appeal not only to CIOs who want to do analysis projects without adding server infrastructure, but also to people looking to do a little rogue IT data anlaysis.
Michael Fitzgerald
50%
50%
Michael Fitzgerald,
User Rank: Moderator
11/16/2013 | 2:53:19 PM
Re: Data Visualization At Big Data Scale
Thanks, Doug, for the recomendations. I too, will be intersted to see how widely used it becomes. I suspect it will get adopted, because it's an inexpensive way to get through a lot of data. 
D. Henschen
100%
0%
D. Henschen,
User Rank: Author
11/15/2013 | 1:24:14 PM
Data Visualization At Big Data Scale
To really understand this in detail you have to watch the YouTube Video and possibly download the whitepaper (links to both provided), but the idea is akin to the drill-down analysis you can do in a good dydnamic data visualization tool. As you zoom in, you get the level of detail you're after in RAM -- not all 46 Petabytes. It's a neat zooming trick and I'll be interested to see if the technology gains adoption beyond AT&T labs. 
In A Fever For Big Data
In A Fever For Big Data
Healthcare orgs are relentlessly accumulating data, and a growing array of tools are becoming available to manage it.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Government, May 2014
NIST's cyber-security framework gives critical-infrastructure operators a new tool to assess readiness. But will operators put this voluntary framework to work?
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.