Big Data // Big Data Analytics
News
10/24/2012
04:40 PM
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Why Sears Is Going All-In On Hadoop

Sears pushes the cutting edge with some big data techniques, while trying to sell its big data services. Can emerging tech drive change in old-school companies?

'ETL Must Die'

Sears' move to Hadoop began as an experiment using a single node running on a netbook computer--the netbook that still sits on Shelley's office desk. Sears deployed its first production cluster of 20 to 30 nodes in early 2010. A major big data processing bottleneck then was extract, transform, and load processing, and Shelley has become a zealot about eliminating ETL.

"ETL is an antiquated technique, and for large companies it's inefficient and wasteful because you create multiple copies of data," he says. "Everybody used ETL because they couldn't put everything in one place, but that has changed with Hadoop, and now we copy data, as a matter of principle, only when we absolutely have to copy."

Sears can't eliminate ETL overnight, so it has been moving the slowest and most processing-intensive steps within ETL jobs into Hadoop. Shelley cites an ETL process that took 20 hours to run using IBM DataStage software on a cluster of distributed servers. One step that took 10 hours to run in DataStage now can run in 17 minutes on Hadoop, he says.

One downside: It takes 90 minutes to FTP the job to Hadoop and then bring results back to the ETL servers. That FTP time is a trade-off in Sears' approach of picking off one ETL step at a time. Shelley intends to keep moving steps in that process until the entire data transformation workload is on Hadoop.

"The reason we do it this way is you get a very big hit quickly," he says, noting it takes less than two weeks to get each step into production. Shelley vows to get rid of ETL eventually, "but you do it in a very nondisruptive, non-scary way for the business."

5'Pillars' From Sears Chairman Lampert

1. Lasting customer relationships Sears launched a loyalty program in 2011,expanding personalized promotions
2. Productivity and efficiency They're key to better profits, but Lampert says Sears has "fared very poorly"
3. Building brands Kenmore and Craftsman are strong, but Lampert wants them to be the "Nike and Apple of appliances, tools, and lawn and garden"
4. Reinvent Sears with tech and innovation Everyone, young and old, will use stores, online, and mobile, so Sears needs to make it easier
5. Values More information sharing, more digital tools to store employees

Shelley's "ETL must die" view has its doubters. Coming to the defense of ETL, Mike Olson, CEO of Cloudera, the leading Hadoop software distributor, recently told InformationWeek, "Almost without exception, when we see Hadoop in real customer deployments, it is stood up next to existing infrastructure that's aimed at existing business problems."

Shelley sees Hadoop as part of a larger IT ecosystem, too, and says systems such as Teradata will continue to have an important, focused role at Sears. But he's on the far end of the spectrum in terms of how much of the legacy environment Hadoop might replace. Countering Shelley's sometimes sweeping predictions of legacy system replacement, Olson says: "It's unlikely that a brand-new entrant to the market [like Hadoop] is going to displace tools for established workloads."

Scaling Out

Sears' main Hadoop cluster has nearly 300 nodes, and it's populated with 2 PB of data--mostly structured data such as customer transaction, point of sale, and supply chain. (Hadoop systems create two copies of the data, so the total environment is 6 PB). To give a sense of how early Sears was to Hadoop development, Wal-Mart divulged early this year that it was scaling out an experimental 10-node Hadoop cluster for e-commerce analysis. Sears passed that size in 2010.

Sears now keeps all of its data down to individual transactions (rather than aggregates) and years of history (rather than imposing quarterly windows on certain data, as it did previously). That's raw data, which Shelley says Sears can refactor and combine as needed quickly and efficiently within Hadoop.

Hadoop isn't a science project at Sears--critical reports run on the platform, including financial analyses; SEC reporting; logistics planning; and analyses of supply chains, products, and customer data. For ad hoc query and analysis, Sears uses Datameer, a spreadsheet-style tool that supports data exploration and visualization directly on Hadoop, without copying or moving data. Using Datameer, Sears can develop in three days interactive reports that used to take IT six to 12 weeks, Shelley says. The old approach required intensive IT support for ETL, data cubing, and associated testing. Now line-of-business power users are developing most of the new reports.

Previous
2 of 3
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Soozy G. Miller
50%
50%
Soozy G. Miller,
User Rank: Apprentice
12/5/2012 | 3:34:31 PM
re: Why Sears Is Going All-In On Hadoop
"But will companies be interested in buying big data cloud and consulting services from Sears?"

So, what, is Sears planning on entering a whole new market? Then again, The Gap started by selling music and jeans.
Soozy G. Miller
50%
50%
Soozy G. Miller,
User Rank: Apprentice
12/5/2012 | 3:31:54 PM
re: Why Sears Is Going All-In On Hadoop
@srentner good point. It's great that Sears has a great big data solution; it's quite another to get the analytics to put that big data to use.
srentner
50%
50%
srentner,
User Rank: Apprentice
10/31/2012 | 7:59:28 PM
re: Why Sears Is Going All-In On Hadoop
The final questions are extremely apropos and often cause the most confusion: "Could quick analytical access to an entire decade of medical record data change how doctors diagnose and treat patients? Could faster processing spot financial services fraud more effectively?" This is not what Hadoop does. It is not an analytics technology, as pointed out in page 1. Extracting this type of valuable insight from the data requires a new class of analytics technologies, and the more powerful the mathematical algorithms, the faster and more accurate the insight.
Ellis Booker
50%
50%
Ellis Booker,
User Rank: Strategist
10/31/2012 | 3:07:19 PM
re: Why Sears Is Going All-In On Hadoop
This is a big story on a number of fronts. First, it clearly expresses the value of big data analysis for retailers. As one of the Sears executives puts it, "With Hadoop we can keep everything, which is crucial because we don't want to archive or delete meaningful data." Second, it addresses the oft-heard complaint that big data solutions are prohibitively expensive--in fact, Sears says it reduced mainframe costs by more than $500,000 per year. Finally, the installation moves the retailer closer to real-time analysis: "Sears can develop in three days interactive reports that used to take IT six to 12 weeks." --Ellis Booker, InformationWeek Community Editor
6 Tools to Protect Big Data
6 Tools to Protect Big Data
Most IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek - September 2, 2014
Avoiding audits and vendor fines isn't enough. Take control of licensing to exact deeper software discounts and match purchasing to actual employee needs.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
Howard Marks talks about steps to take in choosing the right cloud storage solutions for your IT problems
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.