Skip to content

IgnitionAI/ignition-agent-trainer

Repository files navigation

Ignition Agent Trainer

Evaluation, experimentation and optimization layer for TypeScript AI agents.

The goal is not to replace LangChain, LangGraph, Mastra or the Vercel AI SDK. The goal is to make any existing TypeScript agent measurable, comparable and improvable.

North Star: improve agent behavior through context engineering, rewards and experiments — without fine-tuning the base LLM first.


Why this project exists

Most production AI teams can build a RAG chatbot or a tool-calling agent. The hard part starts after the demo:

  • How do we know the agent works?
  • Which prompt, workflow or retrieval config is best?
  • Did a new change introduce regressions?
  • Should this request use simple RAG, reranking, verification or a multi-step agent?
  • How do we optimize quality while controlling latency and cost?

Ignition Agent Trainer answers this by wrapping existing agents in a standard loop:

Dataset
  ↓
Agent adapter
  ↓
Trace collection
  ↓
Rewards / scorers
  ↓
Leaderboard
  ↓
Optimization

The LLM does not need to be retrained in the first versions. We optimize the system around it:

Prompts
Retrieval parameters
Tool choice
Tool order
Verification rules
Context assembly
Workflow routing
Stop conditions

This is context engineering via reinforcement signals.


Target architecture

apps / SaaS
└─ IgnitionRAG
   ├─ Evaluation Center
   ├─ Experiments
   ├─ Optimization Lab
   └─ Agent Training

open-source framework
├─ @ignitionai/core
├─ @ignitionai/evals
├─ @ignitionai/experiments
├─ @ignitionai/trainer
├─ @ignitionai/environment
├─ @ignitionai/rl
├─ @ignitionai/adapter-langchain
├─ @ignitionai/adapter-langgraph
├─ @ignitionai/adapter-mastra
└─ @ignitionai/adapter-vercel-ai

Quick start

bun install
bun run dev

Run all checks:

bun run ci

Run the basic example:

bun run --filter './examples/basic-eval' dev

Run a typed experiment through the local CLI:

bun run --filter '@ignitionai/cli' dev -- eval run ./examples/context-engineering/experiment.ts

Write a timestamped local report bundle:

bun run --filter '@ignitionai/cli' dev -- eval run ./examples/context-engineering/experiment.ts --bundle reports

Run the sample CI regression gate:

bun run --filter './examples/ci-regression-gate' dev

Run the alpha dogfood document-assistant experiment:

bun run --filter './examples/alpha-dogfood' dev

Run the alpha dogfood experiment through the local CLI and write a report bundle:

bun run --filter '@ignitionai/cli' dev -- eval run ./examples/alpha-dogfood/experiment.ts --bundle reports/alpha-dogfood

Run the IgnitionRAG evaluation bridge prototype:

bun run --filter './examples/ignitionrag-evaluation-bridge' dev

Minimal usage

import { createDataset } from "@ignitionai/core";
import { containsText, costPenalty, latencyPenalty } from "@ignitionai/evals";
import { createExperiment } from "@ignitionai/experiments";

const dataset = createDataset({
  name: "contract-risk-demo",
  items: [
    {
      id: "case-001",
      input: "Find the termination clause and cite the source.",
      expected: {
        contains: ["termination", "notice"],
        citations: ["contract.pdf#p12"],
      },
    },
  ],
});

const simpleRag = {
  id: "simple-rag",
  name: "Simple RAG",
  async run(item) {
    return {
      output: "The termination clause requires 30 days notice. [contract.pdf#p12]",
      trace: {
        steps: [
          { type: "tool_call", name: "search", input: { topK: 5 }, output: "contract.pdf#p12" },
        ],
      },
      usage: { inputTokens: 800, outputTokens: 80, costUsd: 0.002, latencyMs: 1200 },
    };
  },
};

const experiment = createExperiment({
  name: "rag-strategy-comparison",
  dataset,
  variants: [simpleRag],
  rewards: [
    containsText({ weight: 0.4 }),
    latencyPenalty({ maxLatencyMs: 3000, weight: 0.2 }),
    costPenalty({ maxCostUsd: 0.01, weight: 0.2 }),
  ],
});

const report = await experiment.run();
console.table(report.leaderboard);

Roadmap summary

  1. Evals — datasets, scorers, traces, reports.
  2. Experiments — compare prompts, agents, workflows and RAG configurations.
  3. Trainer — generate/evaluate/select loops for prompt and workflow optimization.
  4. Environment — state/action/reward loop for agentic policies.
  5. RL runtime — bandits first, then GRPO-style group optimization, PPO later.
  6. IgnitionRAG integration — Evaluation Center, Experiment Lab, Optimization Lab, Agent Trainer.

See ROADMAP.md for the full implementation plan.


Post-#20 roadmap

The initial foundation backlog is complete through PR #20.

The next phase focuses on alpha readiness, IgnitionRAG bridge work, and cautious RL exploration.

See:


Key docs


Autonomous implementation

Future implementation sessions should start from docs/CODEX_RUNBOOK.md.


Positioning

LangChain / LangGraph / Mastra / Vercel AI SDK
        ↓
Ignition Agent Trainer
        ↓
Evals + rewards + experiments + optimization

We do not compete on agent construction first. We compete on agent measurement and improvement.


Repo status

This repository is an initial scaffold. The code compiles as a design skeleton and the APIs are intentionally small. The first real milestone is a local evaluation runner that can compare two fake or real agent variants over a dataset and output a leaderboard.

About

Evaluation, experimentation and optimization layer for TypeScript AI agents.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors