Being proactive, detecting in advance if a customer is planning to leave, and reacting in time to convince them to stay can result in a more satisfied customer base. Also, it can help understand your customers and why they like or dislike your business. This dataset can help a banking institution reduce churn and offer more tailored products to their customers.
This dataset contains 10,000 records, each of it corresponds to a different bank's user. The target is ExitedTask, a binary variable that describes whether the user decided to leave the bank. There are row and customer identifiers, four columns describing personal information about the user (surname, location, gender and age), and some other columns containing information related to the loan (such as credit score, current balance in the user's account and whether they are an active member among others).
The objective is to train a ML model that returns the probability of a customer to churn. This is a binary classification task, therefore F1-score is a good metric to evaluate the performance of this dataset as it weights recall and precision equally, and a good retrieval algorithm will maximize both precision and recall simultaneously.
Although this dataset can make a huge difference on the banking institution's performance, it has some problems that complicate its usage. Luckily, Synthesized data generation tools can solve these problems in a fast and intuitive way.
This dataset is publicly available in Kaggle's dataset "Predicting Churn for Bank Customers".