Survival Data Mining for Customer Insight - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management

Survival Data Mining for Customer Insight

Data mining techniques that have proven their worth in smaller applications are now crossing over into mainstream business computing. A practical approach will help you better understand customer behavior and reduce churn.

When I'm trying to understand a company's customers by using data collected in its databases, my first inclination is to apply survival data mining. Over the years, I've found that this approach provides rapid feedback about the customers and their behaviors, while at the same time providing a solid basis for quantifying customer value and measuring customer loyalty. This is customer insight in practice.

What is survival data mining? It's the application of survival analysis — a traditional statistical technique — to data mining problems concerning customers. The application to the business world changes the flavor of such statistical techniques, which were honed on the analysis of small numbers of patients in medical studies. Extracting the last iota of information from a handful of customers is no longer the primary concern. The key issue is how to make sense of millions or tens of millions of database records describing current and past customers and their business interactions.

This article presents survival data mining in practice. It starts with a methodology for subscription-based businesses and introduces hazards and survival curves for understanding churn. It then explains how you can quantify results, and then how you can apply the same techniques to general time-to-event problems in business. A technical sidebar ("Calculating Hazards in a Database," at end of article) shows how to do some of the calculations in a relational database.

Hazard Probability

In the medical world, doctors often want to understand which treatments help patients survive longer — and which have no effect at all (or worse). In the business world, the equivalent concern is when customers stop being customers. This is particularly true of businesses that have a well-defined beginning and end to the customer relationship. A good example is a subscription-based relationship, which may be found in a wide range of industries including insurance, communication, cable television, newspaper and magazine publishing, banking, and newly competitive utility markets.

The basis of survival data mining is hazard probability: that is, the chance that someone who has survived for a certain length of time (called customer "tenure") is going to stop, cancel, or expire before the next unit of time. This definition assumes that time is discrete, and such discrete time intervals — days, weeks, or months — fit business needs. By contrast, traditional survival analysis in statistics usually assumes that time is continuous.

Given the right data, calculating the hazard probability for a given tenure t is simple. The probability is the number who succumbed to the risk divided by the population at risk during that tenure. That is, the numerator is the number of customers who stopped with exactly tenure t and the denominator is everyone who had tenures greater than or equal to t. Customers with shorter tenures aren't part of the risk group. The sidebar explains how to calculate hazards directly using a relational database.

A picture paints a thousand words. Figure 1 charts hazard probabilities for customers in a typical subscription business. The horizontal axis is the tenure of customers measured in days; the vertical axis is the probability that customers stop at a particular tenure point.

FIGURE 1 Hazard probabilities for customers in a typical subscription business.

The hazard chart in Figure 1 is an X-ray into the customer life cycle because it highlights different important events. The first hazard probability at time zero is about 4 percent; this bump is due to customers not starting and is often caused by poor customer information being gathered at the point of sale or perhaps by buyer's remorse. At around 60 days, there's a very strong peak in the hazard probability. This peak corresponds to those customers who start but never pay. The company moves customers through various dunning levels to inspire payment. However, at some point, the company must force churn because of nonpayment. Changes in this policy, such as a reduction in the period of time for cutting off nonpaying customers, would be apparent in the hazard probabilities.

At around 90 days, we see another significant spike in the hazards. This spike actually has nothing to do with nonpayment. It's due to the end of the initial promotion. Customers who signed up for this service because the initial offer was cheap often stop when they have to start paying full price. Happily, the customers who stop at this point have at least been paying their bills.

After these two initial peaks, the hazard probability gradually declines, but with a jagged characteristic. The jaggedness is actually due to the one-month billing cycle that most customers are on. Customers are more likely to stop at the end of a billing cycle. One reason is that when customers call in to stop, the stop date is set to the end of the billing cycle unless the customer requests a specific date.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 4
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

10 Things Your Artificial Intelligence Initiative Needs to Succeed
Lisa Morgan, Freelance Writer,  4/20/2021
Tech Spending Climbs as Digital Business Initiatives Grow
Jessica Davis, Senior Editor, Enterprise Apps,  4/22/2021
Optimizing the CIO and CFO Relationship
Mary E. Shacklett, Technology commentator and President of Transworld Data,  4/13/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll