Sample size always plays a role in data science, but there are certain instances where risk, time or expense will limit the size of your data: You can only launch a rocket once; you only have so much time to test a much-needed vaccine; your early-stage startup or B2B company only has a handful of customer data points to work with. And in these small data situations, I’ve found that companies either avoid data science altogether or they are using it incorrectly. One of the more common issues in applying AI is blindly relying on historical data for predicting future situations -- I call this “assuming the past is the future.”
A common example of this is when we assume the model that has worked so well for us in previous markets will work the same “magic” when we use it to launch products in a new market. The problem is, our new market -- the future -- is completely different from the past market, which leaves us with poor judgement, incorrect predictions, and lackluster business results.
Instead of assuming the past is the future, here are three ways to better apply AI to small data sets:
1. Put external data to work. For those relying on historical data, I recommend tapping into external data and applying look-alike modeling. We depend on this more than ever in our history due to the rise of recommendation systems used by Netflix, Amazon, Spotify and more. Even if you only have one or two purchases on Amazon, they have so much information on products in the world and the people who buy them (e.g., external data), that they can make fairly accurate predictions on your next purchase.
Similarly, if you are a B2B company trying to predict your next client, you can build a “deep profile” of prospective clients based on external data to apply look-alike modeling techniques. Even with only a handful of positive examples to work with, this process can do a lot to guide your go-to-market strategy.
2. Use short iterations. One of the setbacks of assuming the past is the future is it limits our creativity and innovation. If possible, create your own lab environment where you can introduce more variables and outcomes that haven’t been used in the past and quickly run multiple trials (e.g., A/B testing) to learn from. This approach works well in marketing campaigns where you don’t need to wait until the end of a long sales cycle to receive feedback around lead conversion. By running these short iterations of trial and error in environments where you can get feedback quickly, you can gain more insight from smaller data sets and improve modeling and creativity.
3. Bring in semantics through human expertise. When you have less data but multiple variables, you can run into the issue of slicing your data too thinly. Imagine analyzing an online shopper who bought diapers, bottles, and nursery decor. You zoom in too closely and you don’t see the pattern that this person might have a baby. External knowledge and human expertise can help businesses achieve better results with fewer data points by applying semantic modeling or context around these variables and accelerate machine learning. The trick to getting this right is in building out a strong taxonomy (also known as ontologies). We work with one of the largest medical device companies out there, and with millions of SKU numbers in their catalog, it’s imperative that human experts develop the taxonomy to understand and characterize families of products in order to also understand customer patterns and improve predictive modeling.
Before venturing into the world of corporate tech, I spent years working in counterterrorism, where we applied AI and machine learning to profile and identify potential terrorists, among other things. It’s especially difficult to model predictions to fight terrorism because there is always a new way to attack, so assuming what worked in the past would work in the future was never an option for us data scientists. We constantly had to think about new ways to apply machine learning to big and small data sets in order to identify terrorists before they committed crimes -- we couldn’t afford not to.
Maybe that’s why I’m so passionate about helping companies break the cycle of using historical data in scenarios where it doesn’t fit. It won’t drive new thinking, creativity, or innovation for your business. Much like counterterrorism, B2B companies failing to constantly innovate their data strategy could mean the death of a new product, and ultimately, the business.