Skip to content

Jiang020609/coderl-lite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeRL-Lite

A lightweight rollout, evaluation, and diagnostics framework for code-agent RL.

Code-agent RL needs reproducible rollout, execution feedback, reward logging, and error diagnostics. CodeRL-Lite is an early-stage scaffold for that loop:

dataset loading -> model sampling -> code execution -> reward/eval result -> pass@k -> trajectory logging -> report.

This is not a full training framework. PPO, GRPO, DPO, Docker isolation, and UI work are out of scope for v0.1.

Quick Start

pip install -e ".[dev]"
pytest
python -m coderl_lite.cli run-toy --out runs/toy_rollouts.jsonl
python -m coderl_lite.cli report --input runs/toy_rollouts.jsonl --out benchmarks/results/report.md

The toy command uses DummyBackend, so no API keys are needed.

Toy Output

run-toy writes JSONL trajectories like:

{"task_id": "toy/add", "prompt": "Write a Python function add(a, b) that returns a + b.", "completion": "def add(a, b):\n    return a + b", "passed": true, "error_type": "passed", "reward": 1.0}

The report includes:

Number of tasks: 1
Number of samples: 2
pass@1: 1.0000
pass@2: 1.0000

OpenAI-Compatible Backend

OpenAIBackend is optional and uses the official openai Python package.

pip install -e ".[openai]"
set OPENAI_API_KEY=...
set OPENAI_BASE_URL=...
set OPENAI_MODEL=...

OPENAI_BASE_URL and OPENAI_MODEL are optional and useful for OpenAI-compatible endpoints.

Safety

The local judge executes generated Python code on your machine. This is unsafe for untrusted model output. The current judge is meant for trusted toy tasks only. Real usage should run generated code in Docker or another sandbox.

Roadmap

  • v0.1: rollout/eval/diagnostics
  • v0.2: vLLM backend + MBPP adapter + parallel execution
  • v0.3: RL data filtering + rejection sampling baseline

This project is currently early-stage.

About

A lightweight rollout, evaluation, and diagnostics framework for code-agent RL.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages