Project · Fine-tune and deploy a domain-adapted model

project

hard

module project

Ship something real. Submit your work when you're done.

Brief

Fine-tune a 7B model (Mistral-7B or Llama-3 8B) using LoRA on a domain of your choice (customer support, medical Q&A, code review, etc.). Use at least 500 training examples. Evaluate against the base model on a custom eval set of 20 examples using an LLM judge. Deploy the merged model behind a vLLM server. The fine-tuned model must win on your eval set by ≥60% win rate.

Deliverables

A `fine_tune.py` script using TRL + PEFT that runs the full SFT pipeline: dataset loading, LoRA config, training, checkpoint saving.
A `eval.py` that runs your 20-example eval set against both the base model and fine-tuned model, using an LLM judge (or a clear rule-based scorer), and prints win/tie/loss counts.
A `serve.py` or docker-compose that starts a vLLM server with the merged LoRA weights.
A `results.md` showing: LoRA config used (rank, alpha, target_modules), training time, GPU memory peak, and eval win rate vs base model.

How we grade it

LoRA config documented with clear rationale for rank/alpha/target_modules choice.
Training completes on a single GPU (any size — scale the model to fit).
Fine-tuned model achieves ≥60% win rate on the 20-example eval set vs the base model.
vLLM server starts cleanly and returns responses to curl in the OpenAI-compatible format.

Project · Fine-tune and deploy a domain-adapted model

Hints

Expected output

Stretch goals