Yahoo is the quintessential information age company. So it made sense when Yahoo officially recognized its most valuable asset-data-by appointing its first chief data officer. Usama Fayyad is applying his analytic alchemy to transmute Yahoo's vast data sources into something better than gold: insight.
Why have a chief data officer?
Not everything should be lumped under the CIO function. Data is a strategic asset, often a revenue source. It's both extremely important and extremely underutilized. Companies not appointing senior management to think about data assets and to have a voice in the organization are doomed to be a lot less competitive.
What will make more businesses aware of data's strategic importance?
All sorts of businesses are being driven to that awareness by consumers. Today, most of the consumer's decision-making cycle happens online. Online is actually giving you an amazing window to look forward and analyze what sales are going to look like. Some of my former clients had an error rate of about 30% when they were trying to forecast sales 30 days out based on historical sales data. We took the error rate to less than five percent, forecasting three months out, by bringing in online data that captures the leading indicators of what consumers are thinking and what they're interested in.
This actually ties into one of my roles as CDO, which is to set strategies for which opportunities we go after.
What analysis problems still challenge you?
One of the big challenges has to do with the rate at which data arrives and the very short time interval you have to process it, react to it and get it into an actionable form. We try to work with academia very closely on this. But in the meantime we have a day-to-day engineering problem to live with. So we do what I call data "triage," meaning the stream comes in and we split it into different streams based on how it needs to be treated.
Another challenge is addressing structured and unstructured data. People now are working on how to map these different data types into a common space to make sense of them all together.
Another very important area is what I call privacy-preserving methods for data analysis. That comes back to the CDO role: managing the data assets and enforcing privacy policies. Researchers are working on ways of transforming data into something safe, undecodable, that still preserves the data's statistical value. They're also working on algorithms that give results from which you can never infer anything about a particular individual, but that give you a prediction of what may be happening in the future at the group or global level.
Will it ever be possible to analyze mixed data types together in real time?
Absolutely. And the proof of concept is, we as human beings do it all the time. Someday, we're going to look back at some of the analyses that we're doing now and think they're so ridiculous. We'll say, "How could these people back then live with these kinds of analyses?"
What's wrong with Web analysis products?
They definitely don't do enough in many areas. First, these tools are obsessed with the clickstream itself. They forget the true business metrics. Almost all of these tools are really built for techies instead of business users. They give too many numbers. It's great to have so much data that you can measure 700 variables all at once. But at the end of the day if you don't present them into the five plus/minus two variables that make the most sense to the business, these numbers will never get used. Outtakes
Favorite saying: "All models are wrong, some are useful" - George E. P. Box
Favorite place to visit: Petra, in Jordan's desert, a rose-colored city carved completely out of stone some 2,000 years ago.