Open source data processing platform has won over Web giants for its low cost, scalability, and flexibility. Now Hadoop will make its way into more enterprises.
Analyzing The Internet
Another company rolling out a large-scale Hadoop deployment is digital media measurement company ComScore. It's planning to use Hadoop as its main platform for raw data analysis, replacing a homegrown, grid-based system built on commodity hardware that it has used since 2004. The grid preprocesses raw data, boiling down hundreds of terabytes of Web clickstream data into orderly data sets that can be loaded onto ComScore's 150-TB Sybase IQ data warehouse, a row-oriented, relational database best suited to analytics.
Sybase IQ lets ComScore measure the traffic of the world's leading websites and do marketing segmentation based on the surfing habits of its panel of more than 2 million Web users. (ComScore's panel is a Web version of the Nielsen households used to track TV viewing.)
ComScore's Hadoop platform is expected to scale better than its grid system, while providing higher utilization rates and reducing operations costs, says CTO Michael Brown. It will also free the company's developers to work on business problems rather than having to maintain and scale a proprietary stack, Brown says.
NoSQL's Driving Factors
What factors are driving, or would drive, your company's interest in using alternative data platforms such as Hadoop?
Ability to manage and process nonrelational and unstructured data
Ability to manage and process massive volumes of data
Lower software and deployment costs than commercial products
Lower hardware and storage scaling costs than commercial products
Interest in new insights, such as social media analysis
Such platforms aren't a priority for my company
Data: InformationWeek 2012 Business Intelligence, Analytics, and Information Management Survey of 431 business technology pros involved with information management tech, October 2011
ComScore first put Hadoop to work for Social Essentials, a service it introduced in June that processes the 5 TB of panelist data the company collects each day to determine the extent to which top social networks, social network brand pages, and influential people on social networks boost visits to and purchases from specific websites.
ComScore's panelists visit more than 140 million social network pages a day. "The Facebook API gives you basic statistics, but marketers have a huge need to know the impact of influencers, the Facebook news feed, the Facebook wall, and branded pages," Brown says.
Using algorithms running on top of Hadoop, ComScore determines which friends, influencers, and pages panelists visited on a given social network. ComScore also has profile information on its panelists and their Web activities, and it uses that information to develop broader insights about social network usage.
Social Essentials is geared to help marketers understand the effectiveness of their social networking activities. If you're Southwest Airlines, for example, the service can tell you that 3% of Web users are likely to visit your site, whereas 12% of those who are fans of the airline's Facebook page are likely to visit and 8% of friends of Facebook fans are likely to visit, Brown says.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.