Data Engineering · Case study · 2026

Serving a model, not just training one

An MLOps project that classifies the severity of French road accidents. The point isn't the model — it's everything around it: a reproducible data pipeline, a clean FastAPI service with prediction, retraining and monitoring endpoints, Docker packaging, and a CI gate. This write-up walks through that engineering.

Pythonscikit-learnFastAPIPydanticDockerdocker-composeGitHub Actionspytest

View code

The problem

Emergency services need to triage. Given the circumstances of a road accident, how severe is it likely to be? This project frames that as a binary classification on the French BAAC 2021 dataset (the official annual injury-accident database, published on data.gouv.fr):

Class	Label	Meaning
1	prioritaire	Victim hospitalized or deceased
0	non-prioritaire	Victim unharmed or lightly injured

I'll be upfront: this is Phase 1 (Foundations) of a larger MLOps pipeline. The model is intentionally a strong baseline — the value on display is the engineering scaffold that lets that model be trained, served, monitored, and retrained reliably.

The data pipeline

BAAC ships as four separate tables — usagers (people), caractéristiques (circumstances), lieux (locations), and véhicules (vehicles). The pipeline pulls them from an S3 bucket, joins them on the accident id Num_Acc, and turns the mess of raw codes into a clean modeling table:

Recode the target grav into the binary priority/non-priority label.
Engineer features: extract the hour from the hrmn time field, compute the victim's age from birth year, and count victims and vehicles per accident.
Fix real-world data quirks: Corsica's department codes (2A → 201, 2B → 202), comma-decimal latitude/longitude, and -1 sentinels converted to NaN.
Impute selected columns by their mode, then split 70/30 into X_train / X_test / y_train / y_test — 28 features in all.

Two scripts make this reproducible end to end: import_raw_data.py downloads the four CSVs, and make_dataset.py produces the processed splits.

The model

The classifier is a RandomForestClassifier (n_jobs=-1, random_state=42) trained on ~54,000 accidents, reaching roughly 77% accuracy on the test set. The training script loads the processed splits, fits the model, logs the score, and serializes the result with joblib so the API can load it at startup:

model = ensemble.RandomForestClassifier(n_jobs=-1, random_state=42)
model.fit(X_train, y_train)
score = model.score(X_test, y_test)   # ~0.77
joblib.dump(model, MODEL_PATH)

Deliberately simple, deliberately reproducible. In an MLOps Phase 1, a dependable baseline you can serve and retrain beats a fancier model you can't operate.

The API

The service is a FastAPI app with a deliberately small, layered structure. The model is loaded once at startup via a lifespan hook; two routers split operational endpoints from inference; a tiny shared-state module lets them talk without dependency-injection ceremony:

src/api/
├── main.py           ← FastAPI app + lifespan (loads model once)
├── schemas.py        ← Pydantic request/response contracts
├── metrics.py        ← in-memory shared state (model, stats, lock)
└── routers/
    ├── monitoring.py ← /health  /stats  /model/info  /retrain
    └── inference.py  ← /predict

Method	Endpoint	Purpose
GET	/health	API and model status
GET	/stats	Prediction counters
GET	/model/info	Loaded model hyperparameters
POST	/predict	Severity prediction
POST	/retrain	Trigger a background retrain

/predict validates the 28-feature payload through a Pydantic schema, runs the forest, and returns not just a class but a calibrated-confidence tier — high / medium / low derived from the predicted probability:

POST /predict
{ "place": 10, "catu": 3, "victim_age": 60, "vma": 50, ... }

→ {
    "prediction": 1,
    "label": "prioritaire",
    "probability": 0.8423,
    "confidence": "high"
  }

The /retrain endpoint is the most interesting piece. It returns 202 Accepted immediately and kicks off training in a FastAPI background task: a subprocess re-runs the training script and, on success, hot-swaps the in-memory model — no restart, no downtime. Combined with the Docker volume that persists the model file, a retrain survives across container restarts.

Engineering & MLOps practices

What turns a script into an operable service is everything around the code:

Containerized — a slim Python image and a docker-compose stack with persistent volumes for data and the model, a /health healthcheck, and a restart policy.
Single source of truth — one config.py holds paths, the S3 URL, the feature list, and the split parameters. Change it once, it propagates everywhere (pipeline, training, API).
CI gate — GitHub Actions runs flake8 (blocking on real errors) then pytest, and fails the build below 60% coverage.
Tested — 11 tests on the data transformations and config, 13 on the API endpoints (mocked), so behavior is pinned before any change ships.
One-command workflows — a Makefile wraps install, lint, test, train, serve, predict, retrain, and the Docker lifecycle.

Takeaways

Component	Skill demonstrated
4-table join + cleaning pipeline	Reproducible data engineering
RandomForest baseline	Pragmatic modeling for an MLOps phase
FastAPI + Pydantic + routers	Clean, layered service design
/retrain with hot model swap	Zero-downtime operational thinking
Docker + compose + healthcheck	Containerization & deployment
GitHub Actions, coverage gate	CI/CD discipline
24 unit tests, Makefile	Testing and developer ergonomics

What I take away: a model in a notebook isn't a product. This project was about the gap between the two — turning a trained classifier into a service you can deploy, monitor, retrain without downtime, and trust because CI and tests guard it.