Reducing Customer Churn in SAAS companies using Data Science
The need to reduce customer churn is now a common wisdom – it is estimated that retaining a customer is 5 to 20 times more profitable than acquiring a new customer.
Data science can be used to predict customer churn, but the approach is different for different business models – for example, a B2B SAAS model like Salesforce will need a different approach from a B2C subscription model like Netflix or a B2C non-subscription model like eBay.
For this discussion, let us focus on the churn issue for B2B SAAS companies that have annual contracts with the customers. Our goal is to predict the likelihood of churn at the end of the contract. Assume we need to know the likelihood for the churn three months before the end of the contract so that we can take the necessary preventing actions.
One key concept of data science is that the data structure used for modeling should be identical to the one used for prediction. To create a model for the churn at the end of the contract, we need to compile customer data as of 3 months before the end of the contract. We also need to know the outcome at the end of the contract (i.e. churn versus no churn).
In the B2B world especially, the main obstacle for effective data science modeling is the incompleteness and inconsistencies in data. An essential work before the modeling exercise is to make sure the data is in great shape. One may want to use third party tools like Zoominfo or data.com for this purpose.
The essential data needed for the churn modeling comes from the following categories, provided with some examples:
Firmographic: Employee size, Annual revenue, Industry, Region
Demographic: Title of the main contact person
Behavioral: Time since last contact from customer, Time since last communication, Number of communications in the last two months.
Usage: Number of seats, Number of active users
Sentiment: Mood of the customer (as indicated by customer rep)
Once the data for each customer (who are in a window of three months before the contract end date) are compiled, it is time to do some modeling. Most modeling nowadays is done using cross-validation methodology, therefore carving out holdout for evaluation is not essential. However, it will be advantageous to have a holdout group to prove to senior management that your churn model works.
The modeling needs to be done using different approaches to find the best. The most relevant for churn modeling are:
- Classification tree
- Logistic regression
- Nearest neighbor
- Naive Bayes
While evaluating the models, following things need to be kept in mind:
- Reduce false negatives (i.e. false prediction that customer won’t churn)
- Avoid over-fitting
Once the correct model is identified, it is time to operationalize it. In the example we discussed, this involves predicting in each month the likelihood of churn of customers whose contract ends in three months.
Once these churn likelihoods are estimated, there needs to be a marketing strategy to deal with the customers so they won’t cancel. One idea is to divide the customers along two axes – “Likelihood to churn” and “Customer value”.
- High likelihood to churn and high customer value segment needs to get very personalized, high touch treatment, probably involving customer support team and potential discounts.
- On the other end, low likelihood to churn and low customer value segment can be addressed with low-touch marketing tactics.
Specific marketing strategies and tactics to reduce churn will depend on individual companies, and will need to be fine-tuned over time.