Analyze 46 Petabytes Of Data Via Laptop? Here's How - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Hardware/Architectures

Analyze 46 Petabytes Of Data Via Laptop? Here's How

AT&T "nanocubes," a poor man's approach to big-data analysis, let users analyze petabytes of data per day in laptop memory.

5 Big Wishes For Big Data Deployments
5 Big Wishes For Big Data Deployments
(click image for larger view and for slideshow)

AT&T Research just put together a kind of poor man's big data approach to complex visualizations. It's developed a way to use a mere laptop to run analytics on the 46 petabytes of data running across AT&T's networks every day. AT&T calls the process nanocubes, a pun on data cubes, used to aggregate large quantities of data in relational databases. The AT&T researchers developed nanocubes as a way to see just how much data they could get into RAM.

Keeping large amounts of data in RAM would make it easier for people to analyze it. "For most visualizations it is crucial when the person viewing the visualization is in the loop," said Chris Volinsky, executive director of the statistics research department at AT&T. "We really wanted to have a fluid, quick response, so users could come up with a question and look at the data quickly to see if their initial hypothesis held any water."

Using Hadoop clusters and scaling across machines causes latency problems that affect visualizations, he said.

[ Get helpful tips from these successful users of big data. Read Big Data Success: 3 Companies Share Secrets. ]

Volinsky is no stranger to hard problems. He was part of the team that won the million-dollar Netflix Prize.

Stuffing billions of data points into computer memory was a big problem of its own. Jim Klosowksi, a principal member of technical staff at AT&T, said data cubes are naturally quick ways to look at data, but get very large. He said AT&T's research team had to come up with "several clever tricks" to minimize the amount of memory needed to store data. Because the data involved both time and geography, one trick was to store answers to common queries once -- time of day, for instance -- and simply use them again, rather than recalculate them. For more on their tricks, see this paper company researchers presented, "Nanocubes for Real-Time Exploration of Spatiotemporal Datasets." There's also a Youtube video.

AT&T is using nanocubes to examine things such as why cellphone calls get dropped on its network. Volinsky noted that AT&T has an app that lets customers report where they are when calls get dropped. The company can look at that data and combine it with internal data on signal strength and other network measures, and then visualize problem areas. It can use those maps to decide where to invest in its network technology.

AT&T could do similar visualizations before, but doing things like zooming in and out of the visualization to get different perspectives on the data was frustratingly slow. The nanocube approach means "you can start from the country and zoom to street level without much pain and suffering," Volinsky said. "It's had a huge impact."

AT&T has made the code for its nanocube visualization tools open-source, so it can be used by other businesses.

What nanocubes might do for other kinds of companies will vary, Volinsky said. "If you are a CIO of a company that has data that falls into a geographic paradigm, or transactions that fall into a geographic paradigm, this is a way of exploring the data in an interactive framework that would be very difficult otherwise."

Visuals of smartphone data using nanocube technology.
Visuals of smartphone data using nanocube technology.

Companies that have geotagged their point-of-sale data, for instance, could build a nanocube that lets them study data over time and look for anomalies in their patterns. They could also potentially stream such data in real-time. Perhaps the main advantage will be that CIOs could set up nanocubes without needing to add to their server infrastructure.

"We aren't aware of other systems out there that allow this kind of interaction and exploration without firing up a big IT project," Volinsky said.

Making decisions based on flashy macro trends while ignoring "little data" fundamentals is a recipe for failure. Also in the new, all-digital Blinded By Big Data issue of InformationWeek: How Coke Bottling's CIO manages mobile strategy. (Free registration required.)

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Author
11/17/2013 | 12:42:57 PM
Re: Data Visualization At Big Data Scale
Nanocubes may appeal not only to CIOs who want to do analysis projects without adding server infrastructure, but also to people looking to do a little rogue IT data anlaysis.
D. Henschen
D. Henschen,
User Rank: Author
11/15/2013 | 1:24:14 PM
Data Visualization At Big Data Scale
To really understand this in detail you have to watch the YouTube Video and possibly download the whitepaper (links to both provided), but the idea is akin to the drill-down analysis you can do in a good dydnamic data visualization tool. As you zoom in, you get the level of detail you're after in RAM -- not all 46 Petabytes. It's a neat zooming trick and I'll be interested to see if the technology gains adoption beyond AT&T labs. 
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
Tech Spending Climbs as Digital Business Initiatives Grow
Jessica Davis, Senior Editor, Enterprise Apps,  4/22/2021
Optimizing the CIO and CFO Relationship
Mary E. Shacklett, Technology commentator and President of Transworld Data,  4/13/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll