What does it take to make the most of big data, as in tens, if not hundreds of terabytes of information? That depends on your needs and priorities. Ad-delivery firm Interclick found a fast platform that helps it be more productive while also delivering near-real-time insight. Harvard Medical School learned that data can grow even when obvious measures such as patient counts and years of data studied remain constant. comScore, the digital-media measurement giant, has twelve years of experience taking advantage of data compression by way of a column-store database. In fact, it uses sorting techniques to optimize compression and reduce processing demands.
Yahoo, eHarmony, Facebook, NetFlix, and Twitter have discovered that Hadoop is an ideal, low-cost platform for processing unstructured data. This open-source project is not just for Internet giants, however. JPMorgan Chase and other mainstream businesses are also taking advantage of Hadoop. And as data supplier InfoChimps has discovered, Hadoop is fast maturing, with a growing selection of add-on and helper applications available to support deployments.
Keep in mind that not all big-data deployments are measured by total scale. Linkshare, for instance, only retains a few months worth or data, but each day it loads and must quickly analyze tens of gigabytes, so it's a big deployment measured on an interday scale. Perhaps the most important lesson detailed in this image gallery is to heed Richard Winter's advice to pay attention to all six dimensions of data warehouse scalability. Only then can you formulate an accurate request for proposal, test for the most demanding needs, and make appropriate technology investments that will meet long-term needs.