Skip to content

Decision Threshold Analysis in Bank Customer Retention using Tidymodels - Part 2 (Bank Customer Churn)

Examining Decision Threshold Analysis in the Continuation of Our Investigation into Bank Customer Churn Issue

Predicting Bank Customer Attrition with Tidymodels - Part 2: Exploring Decision Thresholds Analysis
Predicting Bank Customer Attrition with Tidymodels - Part 2: Exploring Decision Thresholds Analysis

Decision Threshold Analysis in Bank Customer Retention using Tidymodels - Part 2 (Bank Customer Churn)

In the world of banking, understanding the consequences of a model's predictions is just as important as the predictions themselves. This article aims to explain the implications of model output to a non-technical audience, focusing on decision threshold analysis for the Bank Customer Churn problem.

For this exploration, we will use a publicly available dataset from Kaggle and the probably package in R. The dataset, which contains information about customer behaviour and churn, can be found at [https://www.kaggle.com/shivan118/churn-modeling-dataset](https://www.kaggle.com/shivan118/churn-modeling-dataset). The probably package, part of the tidymodels family, provides tools to explore classification thresholds.

In Part 1, we achieved strong results across various classification metrics. Now, in Part 2, we delve into decision threshold analysis.

To perform a cost function analysis with decision threshold analysis, follow these key steps:

1. **Model Training and Probabilistic Predictions** - Fit a classification model (e.g., logistic regression, random forest) to predict the probability that a customer will churn. - Use the model to generate predicted probabilities for each customer.

2. **Define Cost Parameters** - Define your hypothetical Customer Lifetime Value (CLV), e.g., the average revenue gained from a retained customer. For our example, let's assume it to be $149, based on account fees and credit card fees. - Define the cost of intervention, e.g., marketing expenses to retain a customer (cost to attempt to prevent churn). In this case, we assume a cost of $99 for customer service intervention.

3. **Use the "probably" Package for Decision Threshold Analysis** The probably package in R provides tools to explore classification thresholds. By varying the decision threshold (cutoff on predicted probability), you can classify customers as churners or non-churners.

4. **Calculate Confusion Matrix Components at Each Threshold** - Use `probably::roc_data()` or functions that allow threshold analysis to get TP, FP, TN, FN at different cutoff points. - Compute FP and FN counts.

5. **Compute Total Cost and Select Optimal Threshold** - For each threshold, compute total cost using the formula: Total Cost = (FP count × cost of intervention) + (FN count × CLV) - Plot total cost vs. threshold to visualize the cost-effectiveness tradeoff. - Choose the threshold that minimizes total cost.

Here's a sample code outline in R:

```r # ... (code omitted for brevity)

# Find threshold with minimum total cost optimal <- costs %>% filter(total_cost == min(total_cost))

print(optimal)

# Plot total cost vs. threshold library(ggplot2) ggplot(costs, aes(x = threshold, y = total_cost)) + geom_line() + geom_point(data = optimal, aes(x = threshold, y = total_cost), color = "red", size = 3) + labs(title = "Total Cost vs. Decision Threshold", x = "Decision Threshold", y = "Total Cost") ```

By minimizing total cost, we identify the economically optimal threshold to decide when to intervene. This approach helps businesses balance the trade-off between an effective model that differentiates classes moderately well and a lower cost one with more false positive predictions.

In the sample code, we use the probably package to generate ROC data, compute false positives, false negatives, and total cost for each threshold, and finally find the optimal threshold that minimizes total cost.

For more detailed code examples or assistance on model training and prediction steps or how to set this up using specific churn datasets, feel free to ask! Stay tuned for the results of our scenario analysis.

In Part 2 of our exploration, we will utilize the probably package in R to conduct decision threshold analysis on a business problem related to finance, specifically the Bank Customer Churn problem. By minimizing total cost, we can identify the optimal decision threshold, which aids businesses in balancing the trade-off between an effective model and one with more false positive predictions, thereby optimizing their technology investments in the realm of business.

Read also:

    Latest