Overstock Accelerates with Big Data Platform

Overstock is one of the granddaddies of Internet commerce with 20 years worth of customer data to support its marketing analytics. Here's how it increased the speed of models and analysis by 5x.

Jessica Davis, Senior Editor

June 6, 2018

5 Min Read
<p>(Image: TheDigitalArtist/Pixabay)</p>

Getting the attention of consumers is no small feat these days. They are bombarded with advertisements on television, through their mobile phones, and on the Internet. You may see ads from your favorite online retailers follow you from Google to news sites to Facebook.

Engaging with the right consumer at the right time can make a big difference for online retailers, and these ads are an important means of doing it. But there's plenty of competition out there.

If you want to use these kinds of ads to gain consumer attention today, you probably have to act fast. It's not like the Internet commerce of 20 years ago when online retailer Overstock came onto the scene, buying and then selling the inventory of failed online retailers.

Overstock.com is really one of the original online ecommerce businesses. This online retailer was founded the same year as Netflix (the company that started by sending out DVDs via US mail) and 3 years after Amazon.com made its debut selling print books on the Internet and shipping them to your doorstep. It was a time when the World Wide Web -- we called it that back then -- was just getting started as a mainstream network used by consumers for many things, including consumer purchases. After 20 years in business, Overstock has amassed huge volumes of data.

Overstock's business model has evolved over the years beyond discount and liquidation to include sales of new merchandise and hand-crafted merchandise from developing countries. The site sells everything from furniture to apparel to electronics.

Overstock has always been a bit of a trailblazer. For instance, back in 2014 it was among the first big retailers to accept bitcoin for payment. So it shouldn't be a surprise that Overstock would work with newer technologies to get an advantage when it comes to advertising and marketing itself to consumers.

Like many other businesses, Overstock uses SEM, also known as search engine marketing, or paid search, to place advertisements on the familiar sites that consumers use -- from Google Ads to Facebook. If you search for "sectional couch," for instance, an ad for that type of furniture at Overstock.com may very well appear at the top of your search results on Google. And then later, Facebook will show you an ad for sectional couches at Overstock.

Chris Robison, Overstock's lead data scientist for marketing, has overseen the pieces of technology that contribute to the company's effort to bid for ads across various advertising platforms. Among the technologies in place to perform the work were Teradata, Python, Jupyter Notebooks, Apache Spark, Scala, and a large Hadoop cluster. There were many of today's leading-edge technologies in place, Robison told InformationWeek in an interview. But those technologies were siloed.

The problem was gigantic. How do you know when a customer is most likely to purchase? Robison's team wanted to assign scores to customers based on their likelihood of purchasing, and was working to better understand customer browsing and purchasing behavior.

Robison's small team of data scientists -- himself and three others -- had to oversee not just the data science but also the data engineering -- making sure all the technology pieces worked together, which was a time-consuming process.

"We want all the data, all the time, and all in near real time in order to make smart business decisions. Instead of focusing on our critical data problems and models, our data scientists found themselves dealing with the complexities of managing infrastructure," Robison said in a statement.

Plus, communications in this scenario took time, too. For instance, a team member who wanted to get a data set out of one of the data warehouses would need to go to the team, fill out a ticket, and move the data to the environment where it would be used.

"We wanted to speed up the iteration cycle," Robison said in an interview with InformationWeek. "Pushing out new features weeks before we would have been able to -- that can add a significant impact to the bottom line."

They knew they needed to unify the technology pieces for a unified view and to speed up the processes. With the need for speed in mind, Overstock opted to do it with Databricks Unified Analytics Platform. Databricks is a company founded by the creators of Apache Spark, a streaming analytics engine for big data. Databricks first service as a company was a hosted version of Spark. Databricks Unified Analytics Platform provides a hosted, unified platform that includes the technology Overstock needed for its SEM bidding work.

Robison will be recounting more about the story of the Overstock's move during a keynote address on June 6 at Spark + AI Summit in San Francisco.

Robison's team used Spark to filter out bot traffic on the site, and then determine patterns of purchase behavior. For instance, customers were more likely to browse during the day while at work, but then make their actual purchases in the evening when they were back at home.

The deployment of a unified platform has decreased the cost of moving models to production by nearly 50% and has increased the time to stand up new models by 5x, according to Robison.  The team is able to spin up and down clusters through self-service, cluster management, which has also accelerated the process.

Overstock is now applying the platform to stem fraud in its ecommerce operation, Robison said -- individuals making purchases using stolen credit cards and identities.

"The unified platform allows each project to learn from the projects that came before," he said.

About the Author(s)

Jessica Davis

Senior Editor

Jessica Davis is a Senior Editor at InformationWeek. She covers enterprise IT leadership, careers, artificial intelligence, data and analytics, and enterprise software. She has spent a career covering the intersection of business and technology. Follow her on twitter: @jessicadavis.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights