Fine-tuning in one paragraph
Fine-tuning takes a pre-trained LLM and continues training it on a curated dataset specific to your domain or task. The result is a model whose weights are adjusted toward your target — it produces outputs more like the examples in your fine-tuning dataset, follows your instructions more reliably, or adopts your domain vocabulary. Fine-tuning is one of three primary mechanisms for adapting an LLM to specific use, alongside prompt engineering and retrieval-augmented generation. It is rarely the right first answer.
When fine-tuning is the right answer
Fine-tuning is genuinely useful for:
- Style transfer — making the model adopt your brand voice or domain idioms reliably
- Structured output formats — ensuring the model always produces output in your specific JSON or XML shape
- Domain vocabulary — embedding deep medical, legal, or technical jargon into the model's preferred terminology
- Task-specific behaviour — when prompt engineering can't reliably get the behaviour at the volume and consistency you need
For most enterprise use cases none of these apply. Prompt engineering and RAG are sufficient.
When fine-tuning is the wrong answer
Fine-tuning is the wrong tool when:
- You need to add new knowledge — RAG is better; fine-tuning bakes knowledge into weights but does not update easily
- You're solving a problem prompt engineering hasn't been tried on — try prompts first
- You don't have curated training data — fine-tuning needs hundreds to thousands of high-quality examples
- You can't measure quality — fine-tuning without evals will produce a model whose behaviour you can't verify
- The base model already does the task well — fine-tuning on what the model can already do is a waste of effort
Most enterprises that ask for fine-tuning are better served by prompt engineering plus RAG. NINtec's Discovery phase produces an honest recommendation.
Fine-tuning Claude
Anthropic's enterprise programme supports fine-tuning Claude for specific customer workloads. The process is more involved than prompt engineering — curated training dataset construction, evaluation methodology, validation that fine-tuning improved on baseline, ongoing model-version migration discipline. NINtec engages with Anthropic's fine-tuning programme on customer engagements where the use case justifies it; we have completed customer engagements where the recommendation was "do not fine-tune, use prompt engineering instead."