How I built a brain tumor detector (with a lot of AI help)

I should say this first: I am a BCA student. I am not a doctor. The thing I built is not a medical device. The README says it loudly, and I want to say it here too — this is a learning project.

With that out of the way: I wanted to build something that wasn't just another Kaggle notebook. Most "brain tumor" tutorials I read online stop at one classifier and call it done. Real clinical AI does way more than that. So I tried.

Why a pipeline and not just a model

The first thing I learned (from reading papers and asking Claude to explain them in plain English) was that a single classifier is the easy part. The hard part is everything around it:

What if someone uploads a photo of their cat? You need a safety gate that rejects non-brain images before they ever touch the diagnostic model.
What if the tumor is in the corner of the scan? You need localization — drawing a box around the suspicious region — not just a yes/no.
What if the model is just guessing confidently? You need multiple models voting, and you need Grad-CAM to actually see what the network is looking at.

So the pipeline became four stages: gate → localize → ensemble classify → explain.

The stack

YOLO (Ultralytics) for the localization stage. I picked YOLO because it was the most beginner-friendly object detector I could find with a real Python API.
SwinV2 + ConvNeXt + a MONAI branch, weighted as an ensemble. MONAI is the medical-imaging library from NVIDIA — it knows about DICOM and NIfTI formats so I didn't have to reinvent that.
timm for pretrained image backbones.
Gradio for the dashboard, because I needed something I could demo without learning React first.

The reported validation accuracy on the dataset was 98.25%. I want to be honest — that number is on a curated public dataset and means almost nothing about real clinical use. If you take one thing from this post, take that.

What AI actually helped with

I used Claude and ChatGPT a lot. Not to write the code for me — they're not magic — but to do things I genuinely couldn't do alone yet:

Translating papers. Medical AI papers are dense. I'd paste a passage and ask "explain this like I'm a first-year CS student." Then I'd ask follow-ups until I got it.
Debugging tensor shapes. PyTorch shape mismatches at 1 AM are a special kind of pain. AI is really good at "here's my error, here's my model, what's wrong."
Naming things. Half my time was spent renaming variables until the code read like English. AI is great at this and humans are slow at it.
Catching my bad ideas. I once tried to train on a tiny dataset without any augmentation and Claude basically said "that will overfit, here is why." It did. It would have.

What AI did not do: pick the architecture, decide the pipeline stages, write the disclaimers, or test the thing. That part was me, sitting on my bed, reading docs.

The mistakes I made

I committed model weights to git once. They were 400 MB. Don't do this. .gitignore is your friend.
I had a Gradio demo that loaded all three ensemble models into RAM at startup. My laptop has 8 GB. You can guess how that went.
I called it a "diagnostic tool" in an early commit. I went back and rewrote everything to say "educational research pipeline." Words matter when they touch healthcare.

What's next

The honest next step is splitting training code from inference code. Right now they live in the same notebook and it makes the repo look way scarier than it should. I also want a hosted demo that doesn't expose model weights, because right now you basically have to clone the whole thing to try it.

If you're a student looking at clinical AI: build something, but be loud about the disclaimers. People believe what they see on the internet.

How I built a brain tumor detector (with a lot of AI help)

I should say this first: I am a BCA student. I am not a doctor. The thing I built is not a medical device. The README says it loudly, and I want to say it here too — this is a learning project.

Why a pipeline and not just a model

The first thing I learned (from reading papers and asking Claude to explain them in plain English) was that a single classifier is the easy part. The hard part is everything around it:

What if someone uploads a photo of their cat? You need a safety gate that rejects non-brain images before they ever touch the diagnostic model.

What if the tumor is in the corner of the scan? You need localization — drawing a box around the suspicious region — not just a yes/no.

What if the model is just guessing confidently? You need multiple models voting, and you need Grad-CAM to actually see what the network is looking at.

So the pipeline became four stages: gate → localize → ensemble classify → explain.

The stack

YOLO (Ultralytics) for the localization stage. I picked YOLO because it was the most beginner-friendly object detector I could find with a real Python API.

SwinV2 + ConvNeXt + a MONAI branch, weighted as an ensemble. MONAI is the medical-imaging library from NVIDIA — it knows about DICOM and NIfTI formats so I didn't have to reinvent that.

timm for pretrained image backbones.

Gradio for the dashboard, because I needed something I could demo without learning React first.

What AI actually helped with

I used Claude and ChatGPT a lot. Not to write the code for me — they're not magic — but to do things I genuinely couldn't do alone yet:

Translating papers. Medical AI papers are dense. I'd paste a passage and ask "explain this like I'm a first-year CS student." Then I'd ask follow-ups until I got it.

Debugging tensor shapes. PyTorch shape mismatches at 1 AM are a special kind of pain. AI is really good at "here's my error, here's my model, what's wrong."

Naming things. Half my time was spent renaming variables until the code read like English. AI is great at this and humans are slow at it.

Catching my bad ideas. I once tried to train on a tiny dataset without any augmentation and Claude basically said "that will overfit, here is why." It did. It would have.

What AI did not do: pick the architecture, decide the pipeline stages, write the disclaimers, or test the thing. That part was me, sitting on my bed, reading docs.

The mistakes I made

I committed model weights to git once. They were 400 MB. Don't do this. .gitignore is your friend.

I had a Gradio demo that loaded all three ensemble models into RAM at startup. My laptop has 8 GB. You can guess how that went.

I called it a "diagnostic tool" in an early commit. I went back and rewrote everything to say "educational research pipeline." Words matter when they touch healthcare.

What's next

If you're a student looking at clinical AI: build something, but be loud about the disclaimers. People believe what they see on the internet.

How I built a brain tumor detector (with a lot of AI help)

How I built a brain tumor detector (with a lot of AI help)

Why a pipeline and not just a model

The stack

What AI actually helped with

The mistakes I made

What's next

Related posts

Lung cancer prediction from survey data: what a small, imbalanced dataset taught me

Credit risk analysis as a BCA student: turning a Kaggle dataset into something a business person could read

Building LumaTorrent: a desktop app where I had to learn Rust and Tauri at the same time

How I built a brain tumor detector (with a lot of AI help)

How I built a brain tumor detector (with a lot of AI help)

Why a pipeline and not just a model

The stack

What AI actually helped with

The mistakes I made

What's next

Related posts

Lung cancer prediction from survey data: what a small, imbalanced dataset taught me

Credit risk analysis as a BCA student: turning a Kaggle dataset into something a business person could read

Building LumaTorrent: a desktop app where I had to learn Rust and Tauri at the same time