Accurately assessing the risk of a credit card product can drastically affect the financial outcome of any banking business. And the decision making process for each user and each credit line relies on historical user behaviour data. The objective of the credit card issuer is to maximise the number of open credit lines while keeping the number of defaulters as low as possible, while having those users with higher risk on lower credit lines.
This dataset is formed by historical data points for 30,000 users, from April to September, 2005. Columns contain user and historical payments information among others, and the target variable "default payment next month" flags those users that didn't paid next month statement. Explanatory variables contain personal information about the user and their current credit (gender, education, family status, and current credit limit) and information about the credit status for the previous 6 months (repayment status, amount of bill and amount of payment).
The objective is to train a ML model that gives a default probability to each user in the subsequent month. This is a binary classification task, therefore F1-score is a good metric to evaluate the performance of this dataset as it weights recall and precision equally, and a good retrieval algorithm will maximize both precision and recall simultaneously.
Although this dataset can make a huge difference on the credit business' performance, it has some problems that complicate its usage. Luckily, Synthesized can solve these problems in a fast and intuitive way.
This dataset is publicly available in UCI dataset repository as "Credit Card Payments"