Default Detector
• By Arman Drismir • 1 minute readI built a svelte/flask app around the ML model I made for a CPSC 330 class project.
Dataset & Metrics
This dataset is imbalanced, 22% of clients default while 78% do not. For this reason F1 score and Recall are the metrics we want to pay a lot of attention to. F1 score will give us a good indication of overall performance and Recall will tell us how well our model preforms when it sees the clients who do default.
The Kaggle page has descriptions for what values in the columns represent. Link to dataset.
Best Model
I was able to get a test accuracy of 82% using CatBoost with hyperparameters I found using RandomizedSearch. The results were less than I expected since a dummy classifier scores 78%. Running RandomizedSearch on a more powerful computer would have probably allowed me to squeeze a few more points of accuracy. The overall f1, recall, and precision scores are good but when looking specifically at the default class they are very bad. It would be interesting to train a new model that sacrifices accuracy for recall in the DEFAULT class.
precision | recall | f1-score | |
---|---|---|---|
PAID | 0.84 | 0.95 | 0.89 |
DEFAULT | 0.68 | 0.37 | 0.48 |
average | 0.81 | 0.82 | 0.80 |
All Models
Linear Regression
Train score: 0.810524, improved to 0.8106 with HPO (+0.000076).
SVC
Train score: 0.8177, improved to 0.8183 with HPO (+0.0006)
Gradient Boosted Trees
Train score: 0.8190, decreased to 0.8189 with HPO (-0.0001)
CatBoost
Train score: 0.8175, improved to 0.8214 with HPO (+0.0039)