Skip to Content

16. Observability and LLMOps

You can’t fix what you can’t see. Production LLM systems need the same observability as any backend service — plus LLM-specific signals.

What to Trace

What to Alert On

Prompt Management

Version prompts in code (same repo, same PR). Tag each LLM call with the prompt version. When quality drops, you need to know which prompt change caused it. Some teams use prompt registries — only adopt one if git isn’t enough.

A/B Testing

Route a percentage of traffic to a new prompt or model. Compare quality scores, latency, and cost. Don’t A/B test without evals — you’ll have data but no signal.

Tools

ToolStrengths
LangSmithTracing, evals, prompt playground
LangfuseOpen-source alternative
Arize PhoenixOpen-source, strong on retrieval analysis
PortkeyGateway with built-in observability
Weights & BiasesExperiment tracking, extends to LLM eval

Resources

LangSmith docs · Langfuse docs · Arize Phoenix · Portkey docs