Home

Why Sears Is Going All-In On Hadoop

Comments | Doug Henschen, InformationWeek | October 31, 2012 08:00 AM


Like many retailers, Sears Holdings, the parent of Sears and Kmart, is trying to get closer to its customers. At Sears' scale, that requires big-time data analysis capabilities, but three years ago, Sears' IT wasn't really up to the task.

"We wanted to personalize marketing campaigns, coupons, and offers down to the individual customer, but our legacy systems were incapable of supporting that," says Phil Shelley, Sears' executive VP and CTO, in a meeting with InformationWeek editors and his team at company headquarters in suburban Chicago.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Improving customer loyalty, and with it sales and profitability, is desperately important to Sears as it faces fierce competition from Wal-Mart and Target, as well as online retailers such as Amazon.com. While revenue at Sears has declined, from $50 billion in 2008 to $42 billion in 2011, big-box rivals Wal-Mart and Target have grown steadily, and they're far more profitable. Meantime, Amazon has gone from $19 billion in revenue in 2008 to $48 billion last year, passing Sears for the first time.

A Shop Your Way Rewards membership program started by Sears in 2011 is part of a five-part strategy to get the company back on track. Behind the scenes is a cutting-edge implementation of Apache Hadoop, the high-scale, open source data processing platform driving the big data trend. Despite Sears' less-than-cutting-edge reputation as a retailer, the company has been an innovator in using big data. In fact, Shelley is leading a Sears subsidiary, MetaScale, that's pitching services to help companies outside retail use Hadoop.

But will companies be interested in buying big data cloud and consulting services from Sears? And can Sears' own big data efforts help the company regain its footing in the retail industry?

Fast And Agile

Sears' process for analyzing marketing campaigns for loyalty club members used to take six weeks on mainframe, Teradata, and SAS servers. The new process running on Hadoop can be completed weekly, Shelley says. For certain online and mobile commerce scenarios, Sears can now perform daily analyses. What's more, targeting is more granular, in some cases down to the individual customer. Whereas the old models made use of 10% of available data, the new models run on 100%.

"The Holy Grail in data warehousing has always been to have all your data in one place so you can do big models on large data sets, but that hasn't been feasible either economically or in terms of technical capabilities," Shelley says, noting that Sears previously kept data anywhere from 90 days to two years. "With Hadoop we can keep everything, which is crucial because we don't want to archive or delete meaningful data."

Sears is still the largest appliance retailer and appliance service provider in the U.S., for example, so it's in a strong position to understand customer needs, service trends, warranty problems, and more. But Sears has only been scratching the surface of using available data.

Enter Hadoop, an open source data processing platform gaining adoption on the strength of two promises: ultra-high scalability and low cost compared with conventional relational databases. Hadoop systems at 200 terabytes cost about one-third of 200-TB relational platforms, and the differential grows as scale increases into the petabytes, according to Sears. With Hadoop's massively parallel processing power, Sears sees little more than one minute's difference between processing 100 million records and 2 billion records.

CTO Shelley: big data zealot
CTO Shelley: big data zealot

The downside of Hadoop is that it's an immature platform, perplexing to many IT shops, and Hadoop talent is scarce. Sears learned Hadoop the hard way, by trial and error. It had few outside experts available to guide its work when it embraced the platform in early 2010.

The company is now in the enviable position of having big data experience among its employees in the U.S. and India. MetaScale will leverage Sears' data center capacity in Chicago and Detroit, just as Amazon Web Services takes advantage of Amazon's massive e-commerce compute capacity.

Open Source Moves In

Sears' embrace of an open source stack began at the operating system level, with Linux. Sears routinely replaces legacy Unix systems with Linux rather than upgrade them, Shelley says, and it has retired most of its Sun and HP-UX servers. Microsoft server and development technologies are also on the way out.

Moving up the stack, Sears is consolidating its databases to MySQL, InfoBright, and Teradata--EMC Greenplum, Microsoft SQL Server, and Oracle (including four Exadata boxes) are on their way out, Shelley says.

Hadoop's power comes from dividing workloads across many commodity Intel x86 servers, each with multiple CPUs and each CPU with multiple processor cores. Since early 2010, Sears has been moving batch data processing off its mainframes and into Hadoop. Cost is the big motivator, as mainframe MIPS cost anywhere from $3,000 to $7,000 per year, Shelley says, while Hadoop costs are a small fraction of that.

Sears says it has surpassed its initial target to reduce mainframe costs by $500,000 per year, while also delivering "at least 20, sometimes 50, up to 100 times better performance on batch times," Shelley says. Eliminating all of the mainframes in use would enable it to save "tens of millions" of dollars, he says.



Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

COMMENTS

Tune In to BYTE
Facebook Twitter LinkedIn Newsletter RSS
Whitepapers
whitepaper
In this paper you will learn the five trends shaping the future of enterprise mobility. Learn how the rise of social media as a business application, the lurring between work and home, the emergence of new mobile devices, the demand for tech savvy employees and changing expectations of corporate IT will fundamentally change the workplace.
whitepaper
In a survey of more than 1,700 information workers (iWorkers) in North America, notebooks, desktops, and smartphones were found to be “must-have” devices, while tablets, slates, and netbooks were relegated to “nice-to-have” status, according to a commissioned study conducted by Forrester Consulting on behalf of Dell and Intel.
Sponsored by: Dell
Upcoming Events