How can you prepare for the big data era? Consider this expert advice from IT pros who have wrestled with the thorny problems, including data growth and unconventional data.
6 of 11
Apache Hadoop, one of the fastest-growing open-source projects going, is a collection of components for handling distributed data-processing, particularly large volumes of unstructured data such as Facebook comments and Twitter tweets, email and instant messages, and security and application logs. MapReduce is a Hadoop-supported programming model for rapid processing of masses of information. Conventional relational databases, such as IBM Netezza, Oracle, Teradata, and MySQL, can't handle this data because it doesn't fit neatly into columns and rows. And even if they could do the job, the cost of the licenses would be prohibitive, as we're talking about hundreds of terabytes or even petabytes. Hadoop software is free, and it runs on low-cost commodity hardware. (Keep in mind that puppies are free, too -- in other words, Hadoop deployments require care and feeding that is not free.)
Hadoop pioneers include Yahoo!, eHarmony, Facebook, NetFlix, and Twitter, but even straight-laced financial giants like JPMorgan Chase are putting Hadoop to work. A growing list of commercial support options will only help Hadoop grow.