Blog Note
Model Eval Checklist
Before shipping any workflow update, I run a lightweight evaluation pass to prevent regressions.
Baseline Steps
- Test against a fixed set of representative prompts.
- Track correctness, latency, and failure modes.
- Compare against the previous known-good version.
Rule
If a change improves one metric while harming another, document the tradeoff before adopting it.