Healthcare // Analytics
News
8/23/2011
12:03 PM
Doug Henschen
Doug Henschen
Slideshows
Connect Directly
Google+
LinkedIn
Twitter
RSS
E-Mail
50%
50%
Repost This

10 Lessons Learned By Big Data Pioneers

How can you prepare for the big data era? Consider this expert advice from IT pros who have wrestled with the thorny problems, including data growth and unconventional data.
Previous
5 of 11
Next


Just as consistent columnar data aids compression, you can improve compression optimization by sorting data before loading. comScore uses Syncsort DMExpress software to sort data alphanumerically before it's loaded into Sybase IQ. Where 10 bytes of unsorted data can be compressed to three or four bytes, says Michael Brown, comScore's chief technology officer, pictured above, 10 bytes of sorted data can typically be crunched down to one byte. "That makes a huge difference in the volume of data we have to store," Brown says.

Sorting also can streamline processing. comScore sorts URL data to minimize Web site taxonomy lookups. Instead of loading the 40 URLs for Web site pages in the order they were visited during a session, sorting might reveal that 20 of those pages were on Facebook, 12 were on GMail and the balance were at NYTimes.com. The sorted data would trigger just three site lookups whereas unsorted data might trigger many redundant lookups if the visitor bounced back and forth among just a few sites. "That saves a lot of CPU time and a lot of effort," Brown says. It's possible to sort data with SQL statements, and custom scripts, but sorting is also a common feature in data-integration software from IBM, Informatica, Oracle, SAP, SAS, Syncsort, and others. At truly large scale, Hadoop is an option for sorting and other processing steps.

RECOMMENDED READING

Big Data A Big Backup Challenge

Big Data: Informatica Tackles The High-Velocity Problem

IBM Picks Hadoop To Analyze Large Data Volumes

Hadoop Big Data Startup Spins Out Of Yahoo

2 Ways Big-Data Analysis Pays Off

A Model For The Big Data Era

Machines Are Driving The Big-Data Era

NOAA CIO Tackles Big Data

Previous
5 of 11
Next
Comment  | 
Print  | 
More Insights
Big Love for Big Data? The Remedy for Healthcare Quality Improvements
Big Love for Big Data? The Remedy for Healthcare Quality Improvements
Healthcare data is nothing new, but yet, why do healthcare improvements from quantifiable data seem almost rare today? Healthcare administrators have a wealth of data accessible to them but aren't sure how much of that data is usable or even correct.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Government, May 2014
Protecting Critical Infrastructure: A New Approach NIST's cyber-security framework gives critical-infrastructure operators a new tool to assess readiness. But will operators put this voluntary framework to work?
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.