When you think of big data and its impact on ecommerce, words such as Hadoop, NoSQL, and predictive modeling might spring to mind. DNA research? Not so much. But Wayfair, an online retailer of home furnishings, is applying research from the scientific discipline of protein analysis to a more pragmatic problem: How to recommend relevant products to shoppers on its site.
In late 2011, the company was searching for a better customer recommendation system. "We found that the family of techniques most well known in that area just didn't work with our data," said Ben Clark, Wayfair's director of search and recommendations, in a phone interview with InformationWeek.
Clark and his team of data scientists scoured academic and industry research papers for innovative approaches, or new ways to look for patterns in data. "One of my guys had a particularly inspired flight of intuition in connecting things that don't--on their face--look well connected at all," said Clark.
That insight came in the form of a 1997 research paper from Dutch bioinformatician Stijn van Dongen. Bioinformatics is a branch of biological science that explores ways to store, analyze, and retrieve biological data. Clark's team began using the clustering techniques that van Dongen had used to analyze proteins, as well as a software toolkit that the Dutch researcher had written and provided a free license to.
[ Related video: Startup Richrelevance: A Next Gen Recommendation Engine. ]
"Sure enough, when we ran our data through it, we could tell immediately that the results looked intuitively good. Then we put it up on our site, and people seem to like it," Clark said. A February 2012 blog post by Clark summarizes how Wayfair used van Dongen's techniques to build its recommendation engine. The post includes a series of four photos, each showing a series of lines and dots that represent clusters of proteins and their connections with one another.
"I don't know what that represents in the protein world, but in my world, it represents a connection between two items," said Clark.
The connections, for instance, could carry several different definitions when applied to an ecommerce site, such as two people who use the same item, or one person who bought two items in the same shopping cart. The thicker the line, the stronger the connection between two items.
Wayfair needed a way to weed out the less relevant connections. Customers "are surfing around our site, and we're trying to make useful lists of things they might want to buy," Clark said. "If we just say that everything is connected, that gives us too much data."
The Dutch researcher's mathematical process allowed Wayfair to remove the "wispy, tenuous connections that aren't as strong," and uncover clusters of things with strong enough connections to be useful to its customers, said Clark.
It's difficult to estimate the economic impact of the new technique, Clark said. However, a similar approach that Wayfair used for another recommendation system has increased customer click-through rate by 18%. "From where I sit in this business, that's a huge increase," said Clark.
It's unclear if van Dongen's clustering techniques and software toolkit would work for other ecommerce sites as well. Clark points to a quote in his blog post from Data Analysis with Open Source Tools, a book by software project consultant Philipp Janert, who states that only spam filtering, credit card fraud detection, and credit scoring applications have been effective across a wide range of usage scenarios.
As for customer recommendation engines: "The approaches that work tend to be quite ad hoc. I think it's still a very difficult problem to solve these things in a general way," said Clark.
See the future of business technology at Interop New York, Oct. 1-5. It's the best place to learn about next-generation technologies including cloud computing, BYOD, big data, and virtualization. Register by Friday, Sept. 28, to save 40% off on Interop New York Conference Passes with code WEYLBQNY09.