Evaluation, experimentation and optimization layer for TypeScript AI agents.
The goal is not to replace LangChain, LangGraph, Mastra or the Vercel AI SDK. The goal is to make any existing TypeScript agent measurable, comparable and improvable.
North Star: improve agent behavior through context engineering, rewards and experiments — without fine-tuning the base LLM first.
Most production AI teams can build a RAG chatbot or a tool-calling agent. The hard part starts after the demo:
- How do we know the agent works?
- Which prompt, workflow or retrieval config is best?
- Did a new change introduce regressions?
- Should this request use simple RAG, reranking, verification or a multi-step agent?
- How do we optimize quality while controlling latency and cost?
Ignition Agent Trainer answers this by wrapping existing agents in a standard loop:
Dataset
↓
Agent adapter
↓
Trace collection
↓
Rewards / scorers
↓
Leaderboard
↓
OptimizationThe LLM does not need to be retrained in the first versions. We optimize the system around it:
Prompts
Retrieval parameters
Tool choice
Tool order
Verification rules
Context assembly
Workflow routing
Stop conditionsThis is context engineering via reinforcement signals.
apps / SaaS
└─ IgnitionRAG
├─ Evaluation Center
├─ Experiments
├─ Optimization Lab
└─ Agent Training
open-source framework
├─ @ignitionai/core
├─ @ignitionai/evals
├─ @ignitionai/experiments
├─ @ignitionai/trainer
├─ @ignitionai/environment
├─ @ignitionai/rl
├─ @ignitionai/adapter-langchain
├─ @ignitionai/adapter-langgraph
├─ @ignitionai/adapter-mastra
└─ @ignitionai/adapter-vercel-aibun install
bun run devRun all checks:
bun run ciRun the basic example:
bun run --filter './examples/basic-eval' devRun a typed experiment through the local CLI:
bun run --filter '@ignitionai/cli' dev -- eval run ./examples/context-engineering/experiment.tsWrite a timestamped local report bundle:
bun run --filter '@ignitionai/cli' dev -- eval run ./examples/context-engineering/experiment.ts --bundle reportsRun the sample CI regression gate:
bun run --filter './examples/ci-regression-gate' devRun the alpha dogfood document-assistant experiment:
bun run --filter './examples/alpha-dogfood' devRun the alpha dogfood experiment through the local CLI and write a report bundle:
bun run --filter '@ignitionai/cli' dev -- eval run ./examples/alpha-dogfood/experiment.ts --bundle reports/alpha-dogfoodRun the IgnitionRAG evaluation bridge prototype:
bun run --filter './examples/ignitionrag-evaluation-bridge' devimport { createDataset } from "@ignitionai/core";
import { containsText, costPenalty, latencyPenalty } from "@ignitionai/evals";
import { createExperiment } from "@ignitionai/experiments";
const dataset = createDataset({
name: "contract-risk-demo",
items: [
{
id: "case-001",
input: "Find the termination clause and cite the source.",
expected: {
contains: ["termination", "notice"],
citations: ["contract.pdf#p12"],
},
},
],
});
const simpleRag = {
id: "simple-rag",
name: "Simple RAG",
async run(item) {
return {
output: "The termination clause requires 30 days notice. [contract.pdf#p12]",
trace: {
steps: [
{ type: "tool_call", name: "search", input: { topK: 5 }, output: "contract.pdf#p12" },
],
},
usage: { inputTokens: 800, outputTokens: 80, costUsd: 0.002, latencyMs: 1200 },
};
},
};
const experiment = createExperiment({
name: "rag-strategy-comparison",
dataset,
variants: [simpleRag],
rewards: [
containsText({ weight: 0.4 }),
latencyPenalty({ maxLatencyMs: 3000, weight: 0.2 }),
costPenalty({ maxCostUsd: 0.01, weight: 0.2 }),
],
});
const report = await experiment.run();
console.table(report.leaderboard);- Evals — datasets, scorers, traces, reports.
- Experiments — compare prompts, agents, workflows and RAG configurations.
- Trainer — generate/evaluate/select loops for prompt and workflow optimization.
- Environment — state/action/reward loop for agentic policies.
- RL runtime — bandits first, then GRPO-style group optimization, PPO later.
- IgnitionRAG integration — Evaluation Center, Experiment Lab, Optimization Lab, Agent Trainer.
See ROADMAP.md for the full implementation plan.
The initial foundation backlog is complete through PR #20.
The next phase focuses on alpha readiness, IgnitionRAG bridge work, and cautious RL exploration.
See:
- Autonomous implementation plan
- Codex runbook
- PR playbook
- Backlog
- Post-#20 roadmap
- Project audit
- Alpha readiness checklist
- Alpha release readiness
- v0.1.0-alpha.0 release notes
- Alpha validation plan
- Definition of Done
- Milestones
- Strategic vision in French
- Feasibility
- Architecture
- Concepts: RL for agents
- ADR 0003: RL foundation sequence
- IgnitionRAG integration plan
- IgnitionRAG integration design
- IgnitionRAG Evaluation Center checklist
- MVP specification
- Competitive landscape
- Implementation checklist
Future implementation sessions should start from docs/CODEX_RUNBOOK.md.
LangChain / LangGraph / Mastra / Vercel AI SDK
↓
Ignition Agent Trainer
↓
Evals + rewards + experiments + optimizationWe do not compete on agent construction first. We compete on agent measurement and improvement.
This repository is an initial scaffold. The code compiles as a design skeleton and the APIs are intentionally small. The first real milestone is a local evaluation runner that can compare two fake or real agent variants over a dataset and output a leaderboard.