Fine-tuning changes model weights so it behaves differently by default — without needing the instructions in every prompt. It’s the third tool alongside prompting and RAG for customising model behaviour. Most teams don’t need it, but you need to know when you do.
| Approach | Use when |
|---|---|
| Prompting | Handles most tasks. Try it first. |
| RAG | Knowledge that changes. Don’t fine-tune facts into the model. |
| Fine-tuning | Model needs to consistently adopt a style, format, or domain behaviour that’s hard to prompt reliably — or you need to shrink a large model’s capability into a smaller, cheaper one. |
Full fine-tuning is expensive. LoRA (Low-Rank Adaptation) and QLoRA (quantised LoRA) train a small number of adapter weights instead of the full model. This is how you fine-tune a 70B model on a single GPU. Learn these if you’re working with open models.
!!! danger “Things that go wrong” - Bad data makes the model confidently worse - Overfitting on small datasets — the model memorises instead of generalising - Catastrophic forgetting — the model loses general capabilities. Evaluate broadly, not just on your task. - Cost: training runs cost money and time. Budget for multiple iterations.
OpenAI fine-tuning · Hugging Face PEFT · QLoRA paper · Anthropic fine-tuning · Axolotl