Online Retailer Uses DNA Research To Connect With Customers
Home furnishings startup Wayfair applies principles of protein analysis to recommend products.
Big Data Talent War: 10 Analytics Job Trends
(click image for larger view and for slideshow)
When you think of big data and its impact on ecommerce, words such as Hadoop, NoSQL, and predictive modeling might spring to mind. DNA research? Not so much. But Wayfair, an online retailer of home furnishings, is applying research from the scientific discipline of protein analysis to a more pragmatic problem: How to recommend relevant products to shoppers on its site.
In late 2011, the company was searching for a better customer recommendation system. "We found that the family of techniques most well known in that area just didn't work with our data," said Ben Clark, Wayfair's director of search and recommendations, in a phone interview with InformationWeek.
Clark and his team of data scientists scoured academic and industry research papers for innovative approaches, or new ways to look for patterns in data.
"One of my guys had a particularly inspired flight of intuition in connecting things that don't--on their face--look well connected at all," said Clark.
That insight came in the form of a 1997 research paper from Dutch bioinformatician Stijn van Dongen. Bioinformatics is a branch of biological science that explores ways to store, analyze, and retrieve biological data.
Clark's team began using the clustering techniques that van Dongen had used to analyze proteins, as well as a software toolkit that the Dutch researcher had written and provided a free license to.
"Sure enough, when we ran our data through it, we could tell immediately that the results looked intuitively good. Then we put it up on our site, and people seem to like it," Clark said. A February 2012 blog post by Clark summarizes how Wayfair used van Dongen's techniques to build its recommendation engine. The post includes a series of four photos, each showing a series of lines and dots that represent clusters of proteins and their connections with one another.
"I don't know what that represents in the protein world, but in my world, it represents a connection between two items," said Clark.
The connections, for instance, could carry several different definitions when applied to an ecommerce site, such as two people who use the same item, or one person who bought two items in the same shopping cart. The thicker the line, the stronger the connection between two items.
Wayfair needed a way to weed out the less relevant connections. Customers "are surfing around our site, and we're trying to make useful lists of things they might want to buy," Clark said. "If we just say that everything is connected, that gives us too much data."
The Dutch researcher's mathematical process allowed Wayfair to remove the "wispy, tenuous connections that aren't as strong," and uncover clusters of things with strong enough connections to be useful to its customers, said Clark.
It's difficult to estimate the economic impact of the new technique, Clark said. However, a similar approach that Wayfair used for another recommendation system has increased customer click-through rate by 18%. "From where I sit in this business, that's a huge increase," said Clark.
It's unclear if van Dongen's clustering techniques and software toolkit would work for other ecommerce sites as well. Clark points to a quote in his blog post from Data Analysis with Open Source Tools, a book by software project consultant Philipp Janert, who states that only spam filtering, credit card fraud detection, and credit scoring applications have been effective across a wide range of usage scenarios.
As for customer recommendation engines: "The approaches that work tend to be quite ad hoc. I think it's still a very difficult problem to solve these things in a general way," said Clark.
See the future of business technology at Interop New York, Oct. 1-5. It's the best place to learn about next-generation technologies including cloud computing, BYOD, big data, and virtualization. Register by Friday, Sept. 28, to save 40% off on Interop New York Conference Passes with code WEYLBQNY09.
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.