How can you prepare for the big data era? Consider this expert advice from IT pros who have wrestled with the thorny problems, including data growth and unconventional data.
5 of 11
Just as consistent columnar data aids compression, you can improve compression optimization by sorting data before loading. comScore uses Syncsort DMExpress software to sort data alphanumerically before it's loaded into Sybase IQ. Where 10 bytes of unsorted data can be compressed to three or four bytes, says Michael Brown, comScore's chief technology officer, pictured above, 10 bytes of sorted data can typically be crunched down to one byte. "That makes a huge difference in the volume of data we have to store," Brown says.
Sorting also can streamline processing. comScore sorts URL data to minimize Web site taxonomy lookups. Instead of loading the 40 URLs for Web site pages in the order they were visited during a session, sorting might reveal that 20 of those pages were on Facebook, 12 were on GMail and the balance were at NYTimes.com. The sorted data would trigger just three site lookups whereas unsorted data might trigger many redundant lookups if the visitor bounced back and forth among just a few sites. "That saves a lot of CPU time and a lot of effort," Brown says. It's possible to sort data with SQL statements, and custom scripts, but sorting is also a common feature in data-integration software from IBM, Informatica, Oracle, SAP, SAS, Syncsort, and others. At truly large scale, Hadoop is an option for sorting and other processing steps.
Top IT Trends to Watch in Financial ServicesIT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Join us for a roundup of the top stories on InformationWeek.com for the week of September 18, 2016. We'll be talking with the InformationWeek.com editors and correspondents who brought you the top stories of the week to get the "story behind the story."