Running a Cloudera Hadoop cluster on Amazon Web Services, Paytronix gains insight into customer behavior it couldn't tease out of a database.
16 Top Big Data Analytics Platforms
(Click image for larger view and slideshow.)
With fewer than 100 employees, Paytronix Systems technically fits in the small-business category, but it has an outsized reputation in the restaurant industry, fueled in part by its ability to deliver big-data driven customer insights.
Paytronix helps restaurant chains (and, more recently, convenience store chains) run customer loyalty programs and marketing campaigns. It has been a data-intensive business since its founding in 2001. In 2012, the company started experimenting with a Cloudera-based Hadoop cluster running on Amazon Web Services (AWS). It hasn't looked back.
Two things people say about the cloud: It's for SMBs, and it's for tire kicking and sandbox development. But as Paytronix has learned, it's not just for SMBs, and it's also useful as a production platform, not just tire kicking. Even the largest companies in the world are choosing to run Hadoop in the cloud, as is the case at the 70,000-plus-employee pharmaceutical firm Merck & Co. Merck Research Laboratories is running a Hortonworks cluster on AWS as the basis of the Merck Data Science Platform.
[See execs from Merck, Paytronix, and the Weather Company on our big data panel at the March 31-April 1 InformationWeek Conference.]
Paytronix still uses Microsoft SQL Server to run its transactional systems and data warehouse, but Hadoop is used to analyze point-of-sale and loyalty program data collected from more than 8,000 restaurants, including locations of chains such as Panera, Papa Gino's, and Outback Steakhouse. All these chains collect the same types of data, but each one has a different data model, making it impossible to use a single data model to analyze the customer behavior patterns separately for each chain.
The company uses its cloud-based Hadoop cluster to store check-level detail from every restaurant in a chain. If a chain changes its menu or adds data points to its loyalty membership database, Paytronix doesn't need to worry about changing a data model or ETL routines. With the combination of Scoobi, Hive queries, and R-based data modeling, it's spotting customer behavior patterns it couldn't see before.
When we last spoke to Paytronix, it had learned to spot customers dining with children -- a huge impetus for restaurant dining for many chains. When customers join loyalty programs, the restaurant doesn't always collect tons of information, and even if it does, customers won't always tell you they are parents. Then there are grandparents, aunts, uncles, and nannies who have no children in their household but nonetheless dine out frequently with children.
Using Hadoop, Paytronix can see when customers are dining in groups early and ordering children's entrees, Shirley Temples, or milk -- sure signs that kids are among the guests. These customers can be targeted for child-related marketing campaigns -- offering discounts or free desserts -- that can give restaurants a big boost in business. Another Hadoop-based analysis Paytronix has put into production spots coupon redemption fraud, behavior that tends to shows up in patterns tied to specific waiters and waitresses.
Paytronix now has at least seven types of customer segmentation analyses in production on Hadoop, with one of the latest being a limited-time offer (LTO) analysis. CEO Andrew Robbins described LTOs, like the McDonald's McRib sandwich, as the "life blood" of restaurant chains, because they keep menus fresh, test new flavor profiles, and sometimes lead to permanent menu changes.
"At the top level, you can see if your sales went up or down, but it's harder to determine who an LTO attracted," Robbins explained. "You many want to target a very specific group with LTOs, like Millennials, so you want to see whether that group bought the special item and what they stopped buying."
In the absence of hard demographic data from loyalty programs, Paytronix can also draw inferences about customer age from social media profiles. Here, too, Hadoop's ability to deal with variable and loosely structured data is an advantage.
Paytronix has backed off a plan to set up a Hadoop cluster on the premises, because it has too many internal projects going on, according to Robbins. The company has also found, through a recent incident, that it is easy to detect and recover from a node failure using AWS tools.
Paytronix's data group has 10 employees, but only four interact with Hadoop in one way or another, and only one was a new hire. One "exceptionally bright" leader in the group championed the project, got the deployment rolling, and now administers the cluster on AWS, Robbins said. A data warehousing veteran has become adept with Hive and extracting data sets from Hadoop. A Java developer works in Scoobi, which is written in Scala, as an easier way to handle MapReduce processing.
If the Paytronix experience can serve as a guide, SQL-savvy data professionals can adapt to new technology. And with the help of the cloud, rolling out a platform like Hadoop won't require an army of new employees.
Interop Las Vegas, March 31 to April 4, brings together thousands of technology professionals to discover the most current and cutting-edge technology innovations and strategies to drive their organizations' success, including BYOD security, the latest cloud and virtualization technologies, SDN, the Internet of Things, and more. Attend educational sessions in eight tracks and visit an Expo Floor with more than 350 top vendors. Register with Discount Code MPIWK for $200 off Total Access and Conference Passes. Find out more about Interop and register now.
Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of ... View Full Bio
In this special, sponsored radio episode we’ll look at some terms around converged infrastructures and talk about how they’ve been applied in the past. Then we’ll turn to the present to see what’s changing.