Making The Case For Hadoop: Variety, Not Volume
Paytronix manages only tens of terabytes, but it offers the perfect example of why we need more than relational databases.
Big data is only one reason data-driven companies are considering new platforms. As Paytronix can attest, data variety is the more compelling reason to consider NoSQL databases and Hadoop.
It's easy to understand Paytronix's needs, because it specializes in managing marketing and loyalty programs for the restaurant sector -- a business with which we're all familiar. Paytronix collects data from more than 8,000 restaurants, mostly locations of chains such as Panera, Papa Gino's, and Outback Steakhouse. The data is used to optimize marketing campaigns and boost sales across chains and in specific locations.
Until last year, Paytronix's center of analysis was a Microsoft SQL Server data warehouse containing only tens of terabytes. But Paytronix couldn't handle the variety of point-of-sale and loyalty card data available, because each chain has its own data model.
[ Want more on analytics and information management? Read our "2014 Analytics, BI, and Information Management Survey Report." ]
"We've held daylong meetings going through these different data structures, saying, 'Can we put it all in a relational database?'" Andrew Robbins, Paytronix's president and founder, told us. "But for every field of data, there seem to be exceptions and problems." Ideas for solutions always seemed to get back to expensive changes in the data model and ETL routines.
Because of the variations from chain to chain, Paytronix aggregated data by category -- appetizer, pasta, dessert, and so on. As a result, you couldn't drill down to see details, such as the popularity of specific menu items by store or across chains. You also couldn't see text modifiers, such as "soup instead of salad" or "substitute potato with rice."
Lured by the promise of being able to load any data and create the schema on read, Robbins said, Paytronix started experimenting with MongoDB (a NoSQL database) and Hadoop in June 2012. Microsoft SQL Server is still used to run Paytronix's transactional systems and the data warehouse, but MongoDB now manages digital creative assets -- such as advertisements, brand logos, signage, and other images -- while Hadoop is used for exploratory analytics.
With Hadoop, Paytronix is storing check-level detail from every restaurant, yet it doesn't have to worry about variations from chain to chain or changing the data model when menus change. Using a combination of R-based data modeling, MapReduce processing, and Hive queries, the company is spotting previously unseen patterns in customer behavior. For example, children often figure in the decision to dine out. But parents don't always tell you that they are parents, even if asked on a loyalty program enrollment form. And then there are the grandparents, aunts, and uncles who frequently take children out to dinner but don't have any kids at home.
Using Hadoop, Paytronix is spotting loyalty club members who are dining early and ordering items such as kids' entrees and milk as a beverage -- telltale signs that kids are among the guests. These customers can be targeted for child-related promotions and discounts that can give restaurants a big boost in business.
Paytronix also used Hadoop to spot coupon fraud that was tied to specific waiters and waitresses. It is working on spotting millennial customers whom restaurants need to attract now that many baby boomers aren't dining out as often. It looks for patterns such as large groups coming in on weekdays after work hours and ordering lots of drinks and appetizers. Lots of restaurants are coming up with social promotions that encourage you to gift friends or give to charities by logging in through Facebook.
"If we have a Facebook account, we can find out what they like, and it turns out [that] the things people like tell you how old they are," Robbins said. For example, tastes in music and movies are reliable indicators of age.
Hadoop is the right platform for analyzing social data, and if Paytronix finds something of value, it can move boiled-down datasets from Hadoop into the data warehouse, where Pentaho BI is used for the reporting, ad hoc queries, and analysis. This midsized marketing firm got started with Hadoop with a Cloudera deployment running in Amazon's cloud, but now that the platform is proven, it's deploying a Hadoop cluster on its premises.
The Paytronix example shows why information management is moving beyond databases. It's not that the databases are going away, but where social data, clickstreams, and sensor data are in use or where plain data inconsistency is a reality, new platforms like Hadoop and NoSQL are gaining adoption.
More details on the Paytronix deployment are featured in our 2014 Analytics, BI, and Information Management Survey Report (registration required). This free report is based on interviews with 248 information management professionals and includes 22 informative charts and graphs.
You can use distributed databases without putting your company's crown jewels at risk. Here's how. Also in the Data Scatter issue of InformationWeek: A wild-card team member with a different skill set can help provide an outside perspective that might turn big data into business innovation (free registration required).
About the Author
You May Also Like