6 Ways To Ask Smarter Questions Of Big Data
To drive more value out of your big data, you have to start with the right questions. Here are six strategies for improving the quality of the questions you ask big data.
![](https://eu-images.contentstack.com/v3/assets/blt69509c9116440be8/bltd9dc61265c268ac1/64cb5057464ef5133b424fa7/1-soap-bubble-63982_1280.jpg?width=700&auto=webp&quality=80&disable=upscale)
Some organizations are driving more value out of big data than others. They're the ones redefining how businesses interact with their customers. They're the ones using data to transform their business models and to innovate.
Getting there is a matter of maturity, however. It takes a critical mass of data, tools, expertise, and dogged curiosity, as well as a willingness to act on the data. Even then, businesses get varied results.
Asking better questions of data is both an art and a science, and it's an iterative process. The most sophisticated and competitive companies are constantly striving to improve their understanding of what data can tell them, and what they can ask of the data.
[ See how companies are turning social media data into dollars and cents. Read 7 Smart Ways To Leverage Social Data. ]
"Most things don't start and end with a single question," said Fabio Luzzi, VP of advanced analytics and data science at mass media company Viacom. "The quality of your questions gets better along the way."
Many organizations are successfully improving operational efficiencies, but fewer are realizing other types of strategic impact that can make them more competitive.
"It's easy to show ROI by improving operational efficiency, but it's much harder to drive top-line revenue impact on business value," said Olly Downs, chief scientist at big data analytics company Globys. "You want to find something that has a huge impact, and there are capabilities now that allow you to do things that were not possible before."
Here are a few ways to improve the quality of the questions and ultimately the quality of insights and the quality of actions taken.
The company with the most data doesn't necessarily win. It's the one that understands how to use data, and for what purpose. Having a clearly stated objective helps narrow the universe of possibilities into a set of relevant choices that can be explored and refined. Keeping the business goal in mind also helps to keep analyses on the right track.
"Sophisticated analytical tools generate more questions," said Luzzi. "At some point you need to get some measurable results, and that's when you realize you're heading in the right direction."
Organizations are able to do analyses that were not possible before. There is more data available to analyze, and the tools and methods are more sophisticated than they once were. The ability to get down to extremely granular levels of detail can be necessary on one hand, but distracting on the other.
"There's a push to get to the nano level of targeting, because the more specific you can get targeting audiences and the way your interactions address those audiences, the more effective they are," said Downs. "It's a little bit like building microprocessors, though. You reach a physical limit where you can't be sure the signal is distinct. You can chase that rabbit hole a long way before you realize you're doing things so granularly that you can't measure whether it's having an effect or not."
Unicorns are few and far between, so data teams and business units need to work together to meet business goals. Data scientists represent phenomena using data and models, and they are expected to understand what the business is trying to achieve. Conversely, business leaders should have at least a business-level understanding of what can be achieved with data.
"I can run a logistic regression on a neural network model, but I guarantee you when I show the results to business leaders they will see things I don't see because they understand the business better than I do. They don't know what statistical techniques I have to apply to analyze a specific business phenomenon," said Luzzi. "I make sure my team [communicates] constantly with the business units because that's how we learn about the business and how we come up with new ideas to implement machine learning techniques."
Effective collaboration requires effective communication, however. A business intelligence team built a model that predicted customer churn and considered it "fantastic" because the model got 100% lift over random. The marketing department thought the model was a disaster because it wasn't 100% accurate, according to Downs.
"If you have a data science team that says they built a great model and a marketing team that says the model doesn't work, then there are people gaps or communication gaps," said Downs.
Machine learning allows companies to discover patterns, develop new and better models, and improve their predictive capabilities, among other things. The massive scale and speed allow organizations to explore problems in ways that would not otherwise be feasible, which sometimes leads to intriguing new questions.
One company wanted to understand the behavioral patterns of specific customers so it could decide what interaction it should have with individual customers on a daily basis. The company estimated it would take an analyst 10 minutes to look at the data, score a couple of predictive models, and decide what action to take for each customer. With 2.6 million customers, it would take 416,000 analysts working 10 hours per day to evaluate each customer eight times per day, which the Globys platform does daily.
"It's an absolutely ridiculous number," said Downs. "You want to exercise the three Vs [volume, variety, and velocity] of the capability you're building, and it's common to exercise one. If you have distributed aggregation capability, you can take your data warehouse cycle down to hours instead of bumping up against a 24-hour cycle. That's an ROI improvement. The licensing cost is better. And you can do the same processes faster. But, what you really want to do is [enable a] three orders of magnitude change in capability and scale."
Sometimes it's not possible to answer certain questions because the data is not available. Even when the data is available, companies aren't always sure they're asking the right questions of it.
Viacom's Luzzi recommends asking the data team whether the right data is available to address a specific business objective.
"At some point, you need to get some measurable results that have some sort of impact. That's the way you know you have the right data and are leveraging the data," he said. "You can run very sophisticated regression, modeling, and clustering, but [most of us are not working at] a university. Things are exciting, but we have to go back to the business with measurable results."
Dunnhumby audits its customers' data, so it knows what data is available and the level of its quality. Assuming the data has been collected correctly, and it's reliable, the data can be used to derive insights that then help formulate questions. From there, CEO Andy Hill said, it's a matter of deciding whether there are other questions that should be answered, and questions that the data does not support. If the latter is the case, the missing data has to be acquired.
Sometimes it may not be obvious that important data is missing. For example, when Globys's Olly Downs was modeling traffic for his former employer, transportation analytics firm Inrix, the prevailing belief was that weather had the best predictive impact on future traffic conditions. However, a local data set revealed that the factor impacting traffic the most in urban areas is K-12 school schedules.
"We never would have come upon that from a hypothetical perspective," said Downs. "Sometimes you have to be careful about what the data can tell you, test it, and see if the data can answer the question. You might be surprised."
Human frailties and non-representative data sets tend to skew results and lead to faulty conclusions.
"It's very important to make sure your data is not skewed towards a subset, because even if you have a lot of data, it may not represent the entire universe," said Luzzi. "If you're not representing your entire universe, your conclusions are not necessarily accurate."
Confirmation bias, a form of cognitive bias, influences the approach to problem solving, as well as the way individuals view data and results. When the purpose of an analysis is to prove a hypothesis, the bias influences the data sets, tests, and outcomes.
"There's an art to the science and there's an intuition to which you have to add objectivity," said Dunnhumby CEO Andy Hill. "To take bias out of data, you have to use statistical expertise to understand if you're looking at a population of consumers, making sure they're statistically representative of the questions you're trying to ask. There are programs and experts who can help you do that."
Human frailties and non-representative data sets tend to skew results and lead to faulty conclusions.
"It's very important to make sure your data is not skewed towards a subset, because even if you have a lot of data, it may not represent the entire universe," said Luzzi. "If you're not representing your entire universe, your conclusions are not necessarily accurate."
Confirmation bias, a form of cognitive bias, influences the approach to problem solving, as well as the way individuals view data and results. When the purpose of an analysis is to prove a hypothesis, the bias influences the data sets, tests, and outcomes.
"There's an art to the science and there's an intuition to which you have to add objectivity," said Dunnhumby CEO Andy Hill. "To take bias out of data, you have to use statistical expertise to understand if you're looking at a population of consumers, making sure they're statistically representative of the questions you're trying to ask. There are programs and experts who can help you do that."
-
About the Author(s)
You May Also Like