TRA gathers its data on TV viewing every day. TRA (formerly known as The Right Audience) was acquired by Tivo last summer, so it now has data from more than 1 million Tivo boxes as well as from more than 3 million conventional set-top cable boxes. With the Tivo data, TRA has data on not only every click on the remote but also time-shifting, ad-skipping, rewind and playback behavior.
TRA keeps 15 months' worth of daily viewing data so it can track campaign results over time, and it's by far the largest chunk of the 15 terabytes the company manages. The supermarket data is refreshed once per week while the auto ownership data is revised quarterly, so these sources account for less than 5% of TRA data.
[ Want more on Kognitio's database? Read Kognitio Tries Fast, Faster, Fastest Data Warehouse Strategy. ]
Media TRAnalytics is far from a petabyte-league big data deployment, but it quickly grew beyond the capabilities of the MySQL database the company started with four and a half years ago. The TRAnalytics service is exposed through an online portal, and the idea is to let media buyers and planners explore as many variables as possible to analyze programming and purchasing habits. But in those early days, complex, multi-dimensional reports were taking as long as 20 minutes. That was unacceptable, and with more data, more variables and more analyses on the way, the company knew it needed a more robust platform.
After reviewing alternatives, TRA switched from MySQL to the Kognitio database, and "all the problems went away," said Canning. Like Netezza, Greenplum and many of the other database options available four years ago, Kognitio offered the power of distributed, massively parallel processing on commodity X86 hardware, but it stood out (as it still does today) for its ability to exploit high levels of memory.
"Use of memory is huge because it's the only way to get reasonable, Internet-query speed response times," says Canning. "We can pin up to 5 terabytes of data into memory, and we need that to generate ratings for all of the shows that people might be watching at any given time."
Having as much as a third of all available data available in memory is unusual for a data warehousing deployment (although some vendors, like SAP, are now touting all-in-memory warehouses). The in-memory access speeds have become increasingly important as TRA now has roughly 10 times more data than it did four years ago. The complexity of reports has also grown, but even the most complex, multi-dimensional reports (with as many as 20,000 lines of data when extracted to Excel) take about one minute. The system has about 400 users from ad agencies, advertisers and TV networks. Last year some 12,000 reports were done on more than 50,000 ad campaigns.
TRA is getting into cross-media measurement through cookie-based information Experian has on the Web browsing habits of 70 million households. That will expand the scale of analysis and the number of correlations available yet again.
"The Internet has obviously become an increasingly important element to advertisers, but they want to know where they'll get the best bang for the buck," Lieberman said. "Should they spend 70% TV and 30% Internet or 40% TV and 60% Internet? We can go deep on TV and Internet ad impacts on purchasing habits to help them determine the right mix." Internet data, too, is privacy protected through double-blind correlation approaches, Lieberman said.
The data that TRA analyzes is all highly structured, so it doesn't fit the classic notion of big data variety -- or the need for Hadoop or a NoSQL database as a platform for variable data. But it's no less a big data deal for TRA customers.
With all the information that's available -- on what's being watched on TV, which ads are actually seen, what cars are owned, what drugs are being prescribed, what websites are visited and what's being purchased the day after seeing particular ads on TV or on the Internet -- it seems like there's very little that can't be known through the power of query, correlation and analysis.