Software // Information Management
Commentary
12/4/2013
09:06 AM
Doug Henschen
Doug Henschen
Commentary
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Making The Case For Hadoop: Variety, Not Volume

Paytronix manages only tens of terabytes, but it offers the perfect example of why we need more than relational databases.

Big data is only one reason data-driven companies are considering new platforms. As Paytronix can attest, data variety is the more compelling reason to consider NoSQL databases and Hadoop.

It's easy to understand Paytronix's needs, because it specializes in managing marketing and loyalty programs for the restaurant sector -- a business with which we're all familiar. Paytronix collects data from more than 8,000 restaurants, mostly locations of chains such as Panera, Papa Gino's, and Outback Steakhouse. The data is used to optimize marketing campaigns and boost sales across chains and in specific locations.

Until last year, Paytronix's center of analysis was a Microsoft SQL Server data warehouse containing only tens of terabytes. But Paytronix couldn't handle the variety of point-of-sale and loyalty card data available, because each chain has its own data model.

[ Want more on analytics and information management? Read our "2014 Analytics, BI, and Information Management Survey Report." ]

"We've held daylong meetings going through these different data structures, saying, 'Can we put it all in a relational database?'" Andrew Robbins, Paytronix's president and founder, told us. "But for every field of data, there seem to be exceptions and problems." Ideas for solutions always seemed to get back to expensive changes in the data model and ETL routines.

Because of the variations from chain to chain, Paytronix aggregated data by category -- appetizer, pasta, dessert, and so on. As a result, you couldn't drill down to see details, such as the popularity of specific menu items by store or across chains. You also couldn't see text modifiers, such as "soup instead of salad" or "substitute potato with rice."

Lured by the promise of being able to load any data and create the schema on read, Robbins said, Paytronix started experimenting with MongoDB (a NoSQL database) and Hadoop in June 2012. Microsoft SQL Server is still used to run Paytronix's transactional systems and the data warehouse, but MongoDB now manages digital creative assets -- such as advertisements, brand logos, signage, and other images -- while Hadoop is used for exploratory analytics.

With Hadoop, Paytronix is storing check-level detail from every restaurant, yet it doesn't have to worry about variations from chain to chain or changing the data model when menus change. Using a combination of R-based data modeling, MapReduce processing, and Hive queries, the company is spotting previously unseen patterns in customer behavior. For example, children often figure in the decision to dine out. But parents don't always tell you that they are parents, even if asked on a loyalty program enrollment form. And then there are the grandparents, aunts, and uncles who frequently take children out to dinner but don't have any kids at home.

Using Hadoop, Paytronix is spotting loyalty club members who are dining early and ordering items such as kids' entrees and milk as a beverage -- telltale signs that kids are among the guests. These customers can be targeted for child-related promotions and discounts that can give restaurants a big boost in business.

Panera is one of the restaurant chains that Paytronix supports.
Panera is one of the restaurant chains that Paytronix supports.

Paytronix also used Hadoop to spot coupon fraud that was tied to specific waiters and waitresses. It is working on spotting millennial customers whom restaurants need to attract now that many baby boomers aren't dining out as often. It looks for patterns such as large groups coming in on weekdays after work hours and ordering lots of drinks and appetizers. Lots of restaurants are coming up with social promotions that encourage you to gift friends or give to charities by logging in through Facebook.

"If we have a Facebook account, we can find out what they like, and it turns out [that] the things people like tell you how old they are," Robbins said. For example, tastes in music and movies are reliable indicators of age.

Hadoop is the right platform for analyzing social data, and if Paytronix finds something of value, it can move boiled-down datasets from Hadoop into the data warehouse, where Pentaho BI is used for the reporting, ad hoc queries, and analysis. This midsized marketing firm got started with Hadoop with a Cloudera deployment running in Amazon's cloud, but now that the platform is proven, it's deploying a Hadoop cluster on its premises.

The Paytronix example shows why information management is moving beyond databases. It's not that the databases are going away, but where social data, clickstreams, and sensor data are in use or where plain data inconsistency is a reality, new platforms like Hadoop and NoSQL are gaining adoption.

More details on the Paytronix deployment are featured in our 2014 Analytics, BI, and Information Management Survey Report (registration required). This free report is based on interviews with 248 information management professionals and includes 22 informative charts and graphs.

You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation (free registration required).

Comment  | 
Print  | 
More Insights
Comments
Threaded  |  Newest First  |  Oldest First
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
12/4/2013 | 9:24:14 AM
Details, Details
The whole data lake/enterprise data hub discussion around Hadoop is about capturing full-fidelity (raw) data on an affordable, high-scale platform and then creating the "schema on read" as particular dimensions of data are deemed relevant. Before Hadoop, Paytronix had to throw away the right detail to get everything into a predefined schema. You're probably thinking that each restaurant chain has access to this data, but many are midsized businesses that don't have BI and analytics chops. They're too busy putting food on tables and planning new menues. They hired Paytronix to help them with marketing and loyalty program optimization.

What I love about the Paytronix story is that it's easily understandable. Restaurants don't just want to know that deserts are doing well, they want to know that it's the new cheesecake that's popular in the Northeast while the Southeast is going for cherry pie.
Laurianne
50%
50%
Laurianne,
User Rank: Author
12/4/2013 | 9:51:27 AM
Great examples
These examples of pattern spotting, such as people dining with kids, will be music to marketers' ears. The restaurant owners must see clear results from the well-targeted promotions.
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Author
12/4/2013 | 10:05:56 AM
RDBMS vendors fighting back?
Doug, Are there projects in the works to make conventional DBs more flexible for queries? Seems like if these vendors want to hang on to marketshare they need to take on that challenge.
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
12/4/2013 | 11:36:14 AM
Re: RDBMS vendors fighting back?
It's not the queries that are inflexible; it's the storage of data predefined into columns and rows. With Hadoop you load anything and come up with the schema (the dimensions of interest) on read, using algorithms, MapReduce, Hive, SQL-on-Hadoop tools, etc. to boil down to the data of interest within that great big lake (a.k.a., Enterprise Data Hub) of information. Some RDMS vendors are trying to make data modeling more flexible (Teradata being one example). Others are finding was to bring unstructured data into the picture -- by, for example, extending SQL queries into Hadoop.

But you can't get around the fact that the best use for RDBMS is structured, consistent data that doesn't change a lot. NoSQL databases are taking off in the transactional and content realm because they also get around this predefined-data-model obstacle.
cbabcock
50%
50%
cbabcock,
User Rank: Strategist
12/4/2013 | 4:57:42 PM
NoSQL takes its place in managing data
Doug is right, the beauty of relational database is its column and row structure, which allows the ACID rules to take effect. They impose consistency on the data throughout the database and allow you to do transactions that never have the numbers screwed up. By relaxing the rules, however, the NoSQL systems gather in many different data types under the same roof, sometimes housing two different types that use the same name. They may also serve up an answer that is slightly out of date, such as there are four competitors playing against you in an online game when in fact a fifth just joined. The value gained from the NoSQL systems in pattern detection far outstrips their limitations. Just don't use them for big, multi-currency transactions: you'll get your total in croners when you meant kroners.
Li Tan
50%
50%
Li Tan,
User Rank: Ninja
12/5/2013 | 12:23:27 AM
Re: NoSQL takes its place in managing data
The major advantage of RDBMS is its ACID compliance capability. By loosing this rule, NoSQL DB is more powerful in handling big data with large variety. I do agree on the title of this post - the outstanding characteristics of big-data is its variety instead of just larget volume. Velocity is another factor to consider - in addition to big-data at rest we need to take care of data in motion. For variety, Hadoop and NoSQL DB really handle them pretty well due to its distributed and rather loosely organized structure. Velocity is another factor and something more like Stream, S4, etc. are needed.
anon3593880708
50%
50%
anon3593880708,
User Rank: Apprentice
12/5/2013 | 12:23:45 PM
analysis is still required
The fields in a _good_ relational database have been defined by experts to cover the problem. It is restrictive, sort of a waterfall approach to data management. You can easily get around that by using non-relational data tools but I don't think anyone gets to bypass the actual analysis.

The reason I think that is important is because many average users (think managers and VPs) assume anything with numbers in it is correct when it may be way off.

A standard regression line on good data is almost always more accurate than any set of experts. But a regression line on bad data is rather useless, no matter how many zillion different data points you have.

What is making the industry look good now is that the only way to use Hadoop and other big data tools is that you must be well-versed in math and logic. Once someone comes up with a turnkey approach to big data, those results will probably be as useless as most of the Microsoft Access "databases" I've tried to understand and fix.
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest September 18, 2014
Enterprise social network success starts and ends with integration. Here's how to finally make collaboration click.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
The weekly wrap-up of the top stories from InformationWeek.com this week.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.