Lumen — A Model-Agnostic AI Coding Agent SDK

English | 中文

Lumen — A Model-Agnostic AI Coding Agent SDK

One command turns any LLM into your code architect.

Lumen is a model-agnostic, production-grade AI coding agent framework (Python 3.11+). Give it an API key and a model name, and you get an AI assistant that can deeply read, write, and analyze code. Works with OpenAI, Anthropic, DeepSeek, Ollama, and any OpenAI-compatible LLM.

Examples

Pomodoro timer — generated end-to-end in Review Mode

A pure HTML/CSS/JS Pomodoro timer with an animated SVG countdown ring, auto-switching work/break sessions, a WebAudio beep, and a localStorage-backed task list — built end-to-end by Lumen running gpt-4.1 under Review Mode (design → pipeline → implementation, gated by request_approval between phases).

Source: examples/pomodoro/

Features at a glance

Model-agnostic — OpenAI / Anthropic / DeepSeek / Ollama / any OpenAI-compatible API, swap with a single line
13 built-in tools — file read/write, code search, shell execution, web search/fetch, LSP code intelligence, sub-agents
Extended Thinking — Anthropic thinking blocks / OpenAI reasoning_effort / generic CoT, with adaptive budgeting
Prompt Cache — native Anthropic cache_control plus a generic hash-based cache; automatically reduces API cost
Structured output — JSON Schema-enforced output for OpenAI / Anthropic / Gemini, validated via Pydantic
Smart permission control — feature-scored command classifier plus regex dual engine, five risk levels, fully explainable
Dynamic session memory — LLM auto-extracts key facts, TF-IDF retrieval, no vector database required
Automatic context compaction — compresses conversation history as the window fills up, continues seamlessly
Skill system — reusable task templates, defined in JSON/YAML with fuzzy search
Persistent retry — tiered model escalation plus webhook alerts, designed for CI / batch workloads
Lifecycle hooks — pre/post tool hooks for interception, mutation, and remediation
Auto-persisted transcripts — JSONL append-only, crash-safe, resume past sessions with /resume
Sub-agent system — parent agents can spawn sub-agents for independent tasks, sync/async modes, linked aborts

Quick start

# Clone, create an env, and run
git clone https://github.com/boots-coder/lumen.git
cd lumen
conda create -n lumen python=3.11 -y && conda activate lumen
pip install httpx pydantic tiktoken rich prompt_toolkit
# Optional: web search tool, YAML-defined skills
pip install duckduckgo_search pyyaml
python chat.py

Or skip the wizard and launch directly:

# OpenAI
python chat.py --model gpt-4o --api-key sk-proj-...

# Anthropic
python chat.py --model claude-sonnet-4-6 --api-key sk-ant-...

# Local Ollama (no key needed)
python chat.py --model llama3.1 --base-url http://localhost:11434/v1

Core capabilities

Tool suite (invoked autonomously by the model)

Tool	Purpose
`read_file`	Read file content by line, with `offset` + `limit` pagination
`write_file`	Create a new file
`edit_file`	Edit an existing file (line-level replacement)
`tree`	Render the project directory (auto-filters `node_modules` etc.)
`definitions`	Extract every class / function / method from a file with line numbers
`glob`	Match file paths by pattern, sorted by modification time
`grep`	ripgrep-powered regex search with three output modes
`bash`	Execute shell commands
`web_search`	DuckDuckGo search (no API key required)
`web_fetch`	Fetch a web page and extract readable text
`lsp`	LSP code intelligence: go-to-definition / find-references / hover / symbol search
`sub_agent`	Spawn a sub-agent for an independent task (requires `enable_subagents()`)

Extended Thinking

Three strategies automatically adapt to different models:

Model family	Mechanism
Anthropic	`thinking: {"type": "enabled", "budget_tokens": N}`
OpenAI o-series	`reasoning_effort: "low"/"medium"/"high"`
Other models	CoT instructions injected via the system prompt

The thinking budget adapts to context usage — as the window fills up, it shrinks automatically to leave room for the actual response.

Structured output

from pydantic import BaseModel

class CodeAnalysis(BaseModel):
    complexity: str
    issues: list[str]
    suggestions: list[str]

result = await agent.query("Analyze this code", schema=CodeAnalysis)
# result is a validated CodeAnalysis instance

Supports OpenAI json_schema / Anthropic tool-use trick / Gemini / generic JSON mode.

Command safety classifier

Replaces pure regex with feature scoring — a weighted analysis across four dimensions:

Executable (weight 0.4) — rm 0.7, sudo 0.9, ls 0.0
Flags (weight 0.2) — -rf 0.8, --force 0.3, --help 0.0
Argument paths (weight 0.1) — / 0.9, /etc 0.4, ./src 0.0
Command composition (weight 0.3) — curl | bash 0.9, subshell 0.3

Context-aware: rm -f is much riskier than -f on its own.

Auto-persisted transcripts

Every message is appended to a JSONL file — crash-safe:

~/.lumen/projects/{project-path}/{session-id}.jsonl

Append-only writes — 100 ms buffer, async flush
Fast recovery — reads metadata from the tail 64 KB, /resume restores in one step
Session ID — UUID auto-generated, shown at startup

Sub-agent system

A parent agent can spawn an isolated sub-agent to parallelize work:

agent.enable_subagents()  # Register the sub_agent tool

# Or use it programmatically
from lumen import SubAgentConfig
result = await agent.subagent_manager.spawn(SubAgentConfig(
    prompt="Audit the security of the auth module",
    description="Security audit",
    run_in_background=True,  # Run asynchronously in the background
))

Isolation policy:

File cache — cloned (no cross-contamination)
Abort — parent/child linked (parent abort cascades to child; child abort does not affect parent)
Session — independent conversation history
Tools — inherited or customized

Dynamic session memory

Every three turns, key facts (preferences / project / patterns / corrections) are auto-extracted
TF-IDF keyword retrieval — no vector database required
Jaccard similarity deduplication, persisted to a JSON file
Relevant memories are automatically injected on the first turn

Context management

Automatic token counting: precise per-turn counts with a live progress bar
Automatic compaction: when the window is nearly full, history is compacted into a structured summary
Manual compaction /compact: trigger any time, keeping the most recent N messages

Memory system (ENGRAM.md)

Four priority levels, auto-discovered and loaded:

/etc/lumen/ENGRAM.md      # System level (lowest priority)
~/.engram/ENGRAM.md       # User level
./ENGRAM.md               # Project level (team-shared)
./ENGRAM.local.md         # Local private (gitignored)

SDK usage

Basic usage

import asyncio
from lumen import Agent
from lumen.tools import (
    FileReadTool, FileWriteTool, FileEditTool,
    GlobTool, GrepTool, TreeTool, DefinitionsTool, BashTool,
)

async def main():
    agent = Agent(
        api_key="sk-...",
        model="gpt-4o",
        tools=[
            TreeTool(), DefinitionsTool(), FileReadTool(),
            GlobTool(), GrepTool(), BashTool(),
            FileWriteTool(), FileEditTool(),
        ],
        auto_compact=True,
        inject_git_state=True,
    )

    response = await agent.chat("Explain the overall architecture of this project")
    print(response.content)

asyncio.run(main())

Advanced: structured output + Thinking

from lumen import Agent, ThinkingConfig, ThinkingMode

agent = Agent(
    api_key="sk-ant-...",
    model="claude-sonnet-4-6",
    tools=[...],
    thinking=ThinkingConfig(
        enabled=True,
        budget_tokens=10000,
        mode=ThinkingMode.AUTO,
    ),
)

# Structured query
result = await agent.query("List every API endpoint", schema=APIEndpoints)

Advanced: persistent retry (CI scenario)

from lumen import Agent, PersistentRetryConfig

agent = Agent(
    api_key="sk-...",
    model="gpt-4o",
    tools=[...],
    persistent_retry=PersistentRetryConfig(
        enabled=True,
        escalation_threshold=5,
        fallback_models=["gpt-4o-mini", "claude-sonnet-4-6"],
        total_timeout_seconds=3600,
    ),
)

Slash commands

Command	Description
`/help`	Show help
`/status`	Token usage and session info
`/mode code`	Enter deep code-reading mode
`/mode general`	Return to general mode
`/compact`	Manually compact the context
`/reset`	Clear conversation history
`/save`	Save the session to a JSON file (transcripts are already auto-persisted)
`/load`	Restore a session from a JSON file
`/resume`	List recent sessions and pick one to resume
`/config`	Reconfigure the model and API key
`/quit`	Exit (or Ctrl+D)

Supported models

Provider	Example models	Notes
OpenAI	`gpt-4o`, `gpt-4o-mini`, `o3-mini`, `o1`	Auto-handles API differences for reasoning models
Anthropic	`claude-sonnet-4-6`, `claude-opus-4-6`, `claude-haiku-4-5`	Native API + thinking blocks
DeepSeek	`deepseek-chat`, `deepseek-reasoner`	OpenAI-compatible protocol
Ollama	`llama3.1`, `qwen2.5`, `mistral`, `deepseek-r1`	Run locally, no key needed
Other	Any OpenAI-compatible API	Custom `base_url`

Requirements

Python 3.11+
ripgrep (optional, used by the grep tool; brew install ripgrep)
An API key for OpenAI / Anthropic / DeepSeek, or a local Ollama

Optional dependencies:

duckduckgo_search — required for the web search tool
pyyaml — required for YAML-formatted skill definitions

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs		docs
examples		examples
lumen		lumen
tests		tests
.gitignore		.gitignore
README.md		README.md
README.zh.md		README.zh.md
chat.py		chat.py
chat_commands.py		chat_commands.py
pyproject.toml		pyproject.toml
展示.gif		展示.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lumen — A Model-Agnostic AI Coding Agent SDK

Examples

Pomodoro timer — generated end-to-end in Review Mode

Features at a glance

Quick start

Core capabilities

Tool suite (invoked autonomously by the model)

Extended Thinking

Structured output

Command safety classifier

Auto-persisted transcripts

Sub-agent system

Dynamic session memory

Context management

Memory system (ENGRAM.md)

SDK usage

Basic usage

Advanced: structured output + Thinking

Advanced: persistent retry (CI scenario)

Slash commands

Supported models

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lumen — A Model-Agnostic AI Coding Agent SDK

Examples

Pomodoro timer — generated end-to-end in Review Mode

Features at a glance

Quick start

Core capabilities

Tool suite (invoked autonomously by the model)

Extended Thinking

Structured output

Command safety classifier

Auto-persisted transcripts

Sub-agent system

Dynamic session memory

Context management

Memory system (ENGRAM.md)

SDK usage

Basic usage

Advanced: structured output + Thinking

Advanced: persistent retry (CI scenario)

Slash commands

Supported models

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages