Credit risk analysis as a BCA student: turning a Kaggle dataset into something a business person could read

This started as a class assignment — predict credit card defaults using the UCI dataset. It became something I actually cared about because of one comment from my professor: "a number is not a decision."

A model outputting 0.73 doesn't help anyone. A model saying "this customer is in the high-risk band, here is why, here is what changes that" does.

The boring-but-important parts

I split the work into three loops, and I think this is the right shape for any small ML project:

Data loop — load, clean, encode, check for leakage. Boring. Critical.
Model loop — baseline first (Logistic Regression), then something fancier (Random Forest), then compare.
Reporting loop — turn probabilities into bands a human can use.

I almost skipped step 3 the first time. The model "worked" — ROC-AUC was 0.78, F1 was decent. I was about to call it done, and then I asked Claude "would a credit analyst actually use this output?" The answer made me redo half the project.

Why ROC-AUC alone is a trap

ROC-AUC tells you the model can rank defaulters higher than non-defaulters on average. It does not tell you:

Whether the model is calibrated (a 0.7 should mean 70%, not "kinda likely").
Whether you'll have precision/recall that makes business sense at any specific threshold.
Whether the model is good for the populations that actually matter (small-balance customers, new customers, etc).

I added PR-AUC, F1, precision, recall, plus a confusion matrix at the chosen threshold. The PR-AUC made me realize my Logistic Regression model was way worse on the minority class than ROC-AUC implied. That was the moment I understood why people complain about ROC-AUC.

Risk bands: the part a human can act on

I bucketed predicted probabilities into five bands: Very Low → Low → Medium → High → Very High. Each band gets:

The count of customers in it.
The actual default rate observed in test data for that band.
A recommendation column (manual review, auto-decline, etc).

This was the part where the project clicked. A non-technical person can read that table. The probability column alone is just numbers.

Synthetic data and being honest about it

The repo ships with synthetic sample data so anyone cloning it can run things immediately without hunting for the UCI download. Big asterisk: any results you see on synthetic data are not real. I labeled the README, the notebook, and the output report with that disclaimer. AI was the one that pushed me to do this — I'd casually mentioned the synthetic dataset to ChatGPT and it said "be louder about which numbers are which." Fair point.

What AI helped with

Asking the analyst question. "Would a real analyst use this?" is the single most valuable prompt I used. Not for code — for framing.
SHAP-style intuition. I haven't added SHAP plots yet (next step), but Claude walked me through what they would mean on this data, which helped me describe feature importance more honestly even with simpler tools.
Pytest patterns. I had no idea how to test a data pipeline. The answer turned out to be: test that the columns exist, test that nulls are handled, test that the bands sum to the total population. Three smoke tests. That's it. AI suggested the shape.

What I learned that I'd tell another student

Make the first model dumb on purpose. Logistic Regression is your sanity check.
Report metrics at a specific operating point, not just the area-under-curves.
Translate to bands or buckets. A continuous probability is for the model. A category is for the human.
Label synthetic data three times — README, notebook header, output file. Future-you will not remember which run used the real data.

This was the project where I stopped thinking of myself as "someone who runs ML code" and started thinking of myself as "someone who can explain ML output to a human." That's a small mental shift and a big one at the same time.

Credit risk analysis as a BCA student: turning a Kaggle dataset into something a business person could read

A model outputting 0.73 doesn't help anyone. A model saying "this customer is in the high-risk band, here is why, here is what changes that" does.

The boring-but-important parts

I split the work into three loops, and I think this is the right shape for any small ML project:

Data loop — load, clean, encode, check for leakage. Boring. Critical.
Model loop — baseline first (Logistic Regression), then something fancier (Random Forest), then compare.
Reporting loop — turn probabilities into bands a human can use.

Why ROC-AUC alone is a trap

ROC-AUC tells you the model can rank defaulters higher than non-defaulters on average. It does not tell you:

Whether the model is calibrated (a 0.7 should mean 70%, not "kinda likely").
Whether you'll have precision/recall that makes business sense at any specific threshold.
Whether the model is good for the populations that actually matter (small-balance customers, new customers, etc).

Risk bands: the part a human can act on

I bucketed predicted probabilities into five bands: Very Low → Low → Medium → High → Very High. Each band gets:

The count of customers in it.
The actual default rate observed in test data for that band.
A recommendation column (manual review, auto-decline, etc).

This was the part where the project clicked. A non-technical person can read that table. The probability column alone is just numbers.

Synthetic data and being honest about it

What AI helped with

Asking the analyst question. "Would a real analyst use this?" is the single most valuable prompt I used. Not for code — for framing.
SHAP-style intuition. I haven't added SHAP plots yet (next step), but Claude walked me through what they would mean on this data, which helped me describe feature importance more honestly even with simpler tools.
Pytest patterns. I had no idea how to test a data pipeline. The answer turned out to be: test that the columns exist, test that nulls are handled, test that the bands sum to the total population. Three smoke tests. That's it. AI suggested the shape.

What I learned that I'd tell another student

Make the first model dumb on purpose. Logistic Regression is your sanity check.
Report metrics at a specific operating point, not just the area-under-curves.
Translate to bands or buckets. A continuous probability is for the model. A category is for the human.
Label synthetic data three times — README, notebook header, output file. Future-you will not remember which run used the real data.

Credit risk analysis as a BCA student: turning a Kaggle dataset into something a business person could read

Credit risk analysis as a BCA student: turning a Kaggle dataset into something a business person could read

The boring-but-important parts

Why ROC-AUC alone is a trap

Risk bands: the part a human can act on

Synthetic data and being honest about it

What AI helped with

What I learned that I'd tell another student

Related posts

Lung cancer prediction from survey data: what a small, imbalanced dataset taught me

How I built a brain tumor detector (with a lot of AI help)

Kyro Downloader: one engine, four UIs, and a lot of learning about contracts

Credit risk analysis as a BCA student: turning a Kaggle dataset into something a business person could read

Credit risk analysis as a BCA student: turning a Kaggle dataset into something a business person could read

The boring-but-important parts

Why ROC-AUC alone is a trap

Risk bands: the part a human can act on

Synthetic data and being honest about it

What AI helped with

What I learned that I'd tell another student

Related posts

Lung cancer prediction from survey data: what a small, imbalanced dataset taught me

How I built a brain tumor detector (with a lot of AI help)

Kyro Downloader: one engine, four UIs, and a lot of learning about contracts