← All products
$39

Eval Kit

Ship with data, not vibes. A TS framework, 60 production test cases, CI workflow, and a 47-item pre-launch QA checklist.

**Ship with data, not vibes.**

A TypeScript-native LLM eval framework you can host yourself. Runs prompt/agent test cases against any provider via Vercel AI Gateway, emits a leaderboard, plays with GitHub Actions, and ships with 60 production-tested seed cases.

This is the unsexy, required member of The Builder's Stack. Without evals, you ship vibes. With them, you ship a regression-tested feature.

The story behind it

## What you get

- The framework itself — ~500 lines of TypeScript, zero deps beyond the `ai` SDK. - 60 seed cases across 10 categories (factuality / hallucination / format-adherence / tool-use / refusal-calibration / code-quality / multi-turn-coherence / cost-latency / bias-fairness / custom-assertions). - A CLI runner that emits a Markdown + HTML leaderboard with diff vs. previous run. - A GitHub Actions workflow that posts the report as a PR comment and blocks merge on regression. - A Slack/Discord webhook notifier for regression alerts. - The *47-item Pre-launch QA Checklist* PDF — the literal checklist to run before shipping any AI feature. - The *Eval Patterns* PDF (32 pages) — methodology for designing eval cases, choosing assertions, and operating an eval suite at scale.

What's inside

  • Real TypeScript framework — clone-and-run, ~500 LOC
  • 60 production-tested seed cases across 10 categories
  • GitHub Actions: block PRs on >5pp regression
  • Slack/Discord regression alerts via webhook
  • 47-item Pre-launch QA Checklist (PDF)
  • Auto-discovers Prompt Foundry + Agent Blueprints evals
Compatible with
ClaudeOpus 4.7 / SonnetGPT-5OpenAIGemini 2.5GoogleVercel AI SDKTSGitHub ActionsCI
eval-kit/eval-kit/· run.ts  2.6KB· runner.ts  5.4KB· assertions.ts  4.3KB· judges.ts  2.2KB· report.ts  3.0KB· types.ts  2.1KB· package.jsontests/· basic.jsonl· refusal.jsonl· json.jsonl· basic.jsonl· safety.jsonl· basic.jsonl· basic.jsonl· budgets.jsonl· probes.jsonl· escape-hatch.jsonl· eval.yml· slack-notifier.ts· qa-checklist.md  4.6KB· eval-patterns.md  11.0KB· CHANGELOG.md· LICENSE.md· README.md

Common questions