diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 7035124..805d334 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -45,6 +45,7 @@ jobs: python examples/basic_cli.py python examples/billing_demo.py python examples/http_driver_demo.py + python examples/tutorial.py conformance_stub: name: "Weaver Spec Conformance Stub (v0.1.0)" diff --git a/CHANGELOG.md b/CHANGELOG.md index 6ecc620..9f11768 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,24 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added +- "Secure your first MCP tool in 5 minutes" tutorial: new + [`docs/tutorial.md`](docs/tutorial.md) walks a new reader from install to a + working invocation, covering registration, principals, grants, the three + LLM-safe response modes (`summary` / `table` / `handle_only`), handle + expansion, policy denial with stable `reason_code`, and `explain()` + audit. The admin-only `raw` mode is described but not exercised by the + walkthrough. Companion runnable example + [`examples/tutorial.py`](examples/tutorial.py) uses `InMemoryDriver` + (offline, zero external deps) and is exercised by `make example` and CI; + it now `assert`s that no PII field leaks into the LLM-safe Frame so a + firewall regression fails the build. (#46) +- README "How this relates to neighboring projects" section: a neutral + boundaries table covering `AgentFence` (external CLI/proxy gate), + `contextweaver` (context compilation library), `ChainWeaver` + (deterministic flow orchestrator), and `weaver-spec` (specification + + conformance suite), plus a "When *not* to use this" callout. (#71) + ## [0.7.0] - 2026-05-20 ### Added diff --git a/Makefile b/Makefile index 3a079f6..5ae72f3 100644 --- a/Makefile +++ b/Makefile @@ -16,5 +16,6 @@ example: python examples/basic_cli.py python examples/billing_demo.py python examples/http_driver_demo.py + python examples/tutorial.py ci: fmt lint type test example diff --git a/README.md b/README.md index fa9ef15..82d0f39 100644 --- a/README.md +++ b/README.md @@ -43,6 +43,8 @@ pip install weaver-kernel > **Note:** The PyPI package is `weaver-kernel` (Weaver ecosystem), but the Python import remains `agent_kernel`. +> **New here?** [docs/tutorial.md](docs/tutorial.md) walks through register → grant → invoke → expand → explain in five minutes. + ```python import asyncio, os os.environ["AGENT_KERNEL_SECRET"] = "my-secret" @@ -110,6 +112,45 @@ asyncio.run(main()) `agent-kernel` sits **above** `contextweaver` (context compilation) and **above** raw tool execution. It provides the authorization, execution, and audit layer. +## How this relates to neighboring projects + +`agent-kernel` is the embeddable runtime layer of the **Weaver ecosystem**. The +projects below solve adjacent problems and are designed to compose, not to +overlap. + +| Project | Role | Where it runs | Use it when… | +|---|---|---|---| +| **agent-kernel** *(this repo)* | Embeddable library/runtime: capability registry, policy, HMAC tokens, context firewall, audit trace. | In-process inside your agent host. | You need authorization, redaction, and audit between an LLM loop and a large tool ecosystem. | +| [**AgentFence**](https://github.com/dgenio/AgentFence) | External CLI / local proxy that intercepts tool calls and applies a policy gate. | Out-of-process, alongside your agent. | You want a policy boundary without changing your agent code, or you need to gate a third-party agent host you can't modify. | +| [**contextweaver**](https://github.com/dgenio/contextweaver) | Library that selects and compiles the context an LLM receives. | In-process, before the LLM call. | You need to assemble relevant context for a prompt. It sits *under* the LLM loop; agent-kernel sits *between* the LLM and tools. | +| **ChainWeaver** | Orchestrator for deterministic tool chains. | In-process or as a separate service. | You need to run a multi-step deterministic flow rather than free-form LLM tool use. | +| [**weaver-spec**](https://github.com/dgenio/weaver-spec) | Specification: invariants, capability/token/frame contracts, conformance suite. | Not a runtime — it's docs + a contract test suite. | You're building another Weaver-compatible implementation, or you want to verify an existing one. | + +A minimal architecture using `agent-kernel` as the central runtime: + +``` +LLM / agent loop + │ + ▼ +contextweaver ─► agent-kernel ─► driver ─► MCP / HTTP / A2A / internal API + │ + ▼ + ActionTrace +``` + +### When *not* to use this + +- You only need a process-level policy gate around an existing agent host — + reach for `AgentFence` instead. +- You only need to compile context for a prompt — use `contextweaver`. +- You want a deterministic, scripted workflow with no LLM in the inner loop — + use `ChainWeaver`. +- You're writing a static analyzer or one-shot CLI scanner with no + per-invocation runtime — `agent-kernel` would be overkill. + +See [docs/tutorial.md](docs/tutorial.md) for an end-to-end "secure your first +MCP tool in 5 minutes" walkthrough. + ## Weaver Spec Compatibility: v0.1.0 agent-kernel is a compliant implementation of [weaver-spec v0.1.0](https://github.com/dgenio/weaver-spec). diff --git a/docs/tutorial.md b/docs/tutorial.md new file mode 100644 index 0000000..e815088 --- /dev/null +++ b/docs/tutorial.md @@ -0,0 +1,280 @@ +# Secure your first MCP tool in 5 minutes + +This walkthrough takes a brand-new reader from `pip install` to a working, +authorized, audited tool invocation in roughly five minutes. Every code block +is copy-pasteable; the runnable companion is +[`examples/tutorial.py`](../examples/tutorial.py) (covered by CI). + +> The PyPI package is **`weaver-kernel`** but the Python import is +> **`agent_kernel`**. We use both names in this document. + +## What you'll learn + +By the end of this page you will have seen, in this order: + +1. How to register a **capability** and how its `safety_class`, + `sensitivity`, and `allowed_fields` shape authorization. +2. How a **principal** is created and why some attributes (like `tenant`) + are required for PII-tagged capabilities. +3. How to issue a signed **token** with `kernel.get_token(...)`. +4. How `kernel.invoke(...)` returns a bounded **Frame** in `summary`, + `table`, or `handle_only` modes — and why `email` never appears in any + of them. +5. How to retrieve filtered raw rows by expanding a **Handle**. +6. What a **policy denial** looks like and how to branch on its stable + `reason_code`. +7. How `kernel.explain(action_id)` returns an audit **ActionTrace**. +8. How to swap the in-process driver for a real **MCP** server. + +## 0. Install + +```bash +pip install weaver-kernel +``` + +For the MCP section near the end, also install the optional extra: + +```bash +pip install "weaver-kernel[mcp]" +``` + +Set a stable HMAC secret for the process. In production this should come +from a real secret store; the example uses a fixed value so the output is +reproducible: + +```python +import os +os.environ["AGENT_KERNEL_SECRET"] = "tutorial-secret-do-not-use-in-prod" +``` + +## 1. Register a capability + +A capability is the unit of authorization. The `safety_class` controls +which roles may call it. The `sensitivity` tag tells the policy and +firewall how to treat the data. `allowed_fields` is the projection the +firewall applies before any row reaches the LLM. + +```python +from agent_kernel import ( + Capability, + CapabilityRegistry, + ImplementationRef, + SafetyClass, + SensitivityTag, +) + +registry = CapabilityRegistry() +registry.register( + Capability( + capability_id="billing.invoices.list", + name="List Invoices", + description="List recent invoices", + safety_class=SafetyClass.READ, + sensitivity=SensitivityTag.PII, + allowed_fields=["id", "customer_name", "amount", "status"], + tags=["billing", "invoices", "list"], + impl=ImplementationRef(driver_id="memory", operation="list_invoices"), + ) +) +``` + +> `email`, `phone`, and other non-listed columns will never reach the LLM +> even if the driver returns them. + +## 2. Wire a driver and the kernel + +`InMemoryDriver` keeps the tutorial offline. The same pattern works with +`HTTPDriver` or `MCPDriver` — see step 8. + +```python +from agent_kernel import HMACTokenProvider, InMemoryDriver, Kernel, StaticRouter +from agent_kernel.drivers.base import ExecutionContext + +INVOICES = [ + {"id": "INV-001", "customer_name": "Alice", "email": "alice@example.com", "amount": 120.0, "status": "paid"}, + {"id": "INV-002", "customer_name": "Bob", "email": "bob@example.com", "amount": 540.0, "status": "unpaid"}, + {"id": "INV-003", "customer_name": "Carol", "email": "carol@example.com", "amount": 75.0, "status": "paid"}, +] + +driver = InMemoryDriver() +driver.register_handler("list_invoices", lambda ctx: list(INVOICES)) + +kernel = Kernel( + registry=registry, + token_provider=HMACTokenProvider(secret="tutorial-secret-do-not-use-in-prod"), + router=StaticRouter(routes={"billing.invoices.list": ["memory"]}), +) +kernel.register_driver(driver) +``` + +## 3. Create a principal + +The `DefaultPolicyEngine` requires a `tenant` attribute on the principal +for any PII-tagged capability. Without it, the grant is denied with +`reason_code="missing_tenant_attribute"`. + +```python +from agent_kernel import Principal + +alice = Principal(principal_id="alice", roles=["reader"], attributes={"tenant": "acme"}) +``` + +## 4. Grant a token + +`get_token` runs the policy engine and returns a signed +`CapabilityToken`. No token, no invocation. + +```python +from agent_kernel.models import CapabilityRequest + +request = CapabilityRequest(capability_id="billing.invoices.list", goal="list recent invoices") +token = kernel.get_token(request, alice, justification="") +print(token.token_id, token.expires_at) +``` + +## 5. Invoke and observe the Frame + +The default `response_mode` is `"summary"`. The Frame holds compact +facts about the data plus a Handle the LLM can expand later. + +```python +import asyncio + +frame = asyncio.run(kernel.invoke(token, principal=alice, args={"operation": "list_invoices"})) +for fact in frame.facts: + print("•", fact) +print("handle:", frame.handle and frame.handle.handle_id) +``` + +Try `response_mode="table"` to get a row preview that respects +`allowed_fields`. Try `response_mode="handle_only"` to skip the preview +entirely — the LLM gets only a reference. In every mode, **`email` is +absent** from the Frame, because it is not in `allowed_fields`. + +```python +table_frame = asyncio.run( + kernel.invoke( + kernel.get_token(request, alice, justification=""), + principal=alice, + args={"operation": "list_invoices"}, + response_mode="table", + ) +) +assert all("email" not in row for row in table_frame.table_preview) +``` + +## 6. Expand a Handle + +Handles let the LLM stay inside its context budget while still pulling +specific rows or fields on demand. The expand query supports `offset`, +`limit`, `fields`, and an equality `filter`. + +```python +handle_frame = asyncio.run( + kernel.invoke( + kernel.get_token(request, alice, justification=""), + principal=alice, + args={"operation": "list_invoices"}, + response_mode="handle_only", + ) +) +expanded = kernel.expand( + handle_frame.handle, + query={"offset": 0, "limit": 2, "fields": ["id", "amount"]}, +) +print(expanded.table_preview) +# [{'id': 'INV-001', 'amount': 120.0}, {'id': 'INV-002', 'amount': 540.0}] +``` + +> **Where the security boundary is today.** The `Firewall` enforces +> `allowed_fields` when it builds the `summary` and `table` previews, so +> disallowed columns never reach the LLM-safe Frame. `HandleStore.expand()` +> currently filters by whatever `fields` the caller passes in the query +> against the stored raw rows — it does **not yet** re-apply the +> capability's `allowed_fields` projection. Until the in-flight grant +> constraint work lands (tracking issue +> [#76](https://github.com/dgenio/agent-kernel/issues/76), PR +> [#79](https://github.com/dgenio/agent-kernel/pull/79)), treat handle +> expansion as authorized-but-field-unconstrained: only request `fields` +> the caller is allowed to see. + +## 7. Watch policy enforcement + +Add a WRITE capability and try to call it as the reader principal. The +denial carries both a human-readable `reason` and a stable +`reason_code` your code can branch on. + +```python +from agent_kernel.errors import PolicyDenied + +registry.register( + Capability( + capability_id="billing.invoices.create", + name="Create Invoice", + description="Create a new invoice", + safety_class=SafetyClass.WRITE, + tags=["billing", "invoices", "create"], + impl=ImplementationRef(driver_id="memory", operation="create_invoice"), + ) +) + +try: + kernel.get_token( + CapabilityRequest(capability_id="billing.invoices.create", goal="create an invoice"), + alice, + justification="reader trying a write — should fail", + ) +except PolicyDenied as exc: + print(exc.reason_code) # 'missing_role' + print(str(exc)) # "WRITE capabilities require the 'writer' or 'admin' role..." +``` + +Stable reason codes come from `agent_kernel.policy_reasons.DenialReason`. +Tests should assert on the code, not on the human-readable string. + +## 8. Audit with `explain()` + +Every successful invocation creates an `ActionTrace` keyed by +`frame.action_id`. The trace records who, what, when, and which driver +served the request — the auditable half of weaver-spec invariant I-02. + +```python +trace = kernel.explain(frame.action_id) +print(trace.action_id, trace.capability_id, trace.principal_id, trace.driver_id) +``` + +## 9. Swap the driver for an MCP server + +The kernel doesn't care whether the driver lives in-process, behind +HTTP, or behind an MCP server — capabilities, policy, tokens, and +firewall behave identically. To talk to a real MCP server, replace +`InMemoryDriver` with `MCPDriver` (full transport details, including +Streamable HTTP, live in [`docs/integrations.md`](integrations.md)): + +```python +from agent_kernel.drivers.mcp import MCPDriver + +driver = MCPDriver.from_stdio( + command="python", + args=["-m", "my_mcp_server"], + server_name="local-tools", +) +kernel.register_driver(driver) + +# Discover the MCP server's tools and register each as an agent-kernel +# capability under a namespace. Set safety_class/sensitivity/allowed_fields +# on the resulting Capability objects to apply policy and the firewall. +capabilities = asyncio.run(driver.discover(namespace="billing")) +registry.register_many(capabilities) +``` + +That's the whole tutorial. From here: + +- [`docs/security.md`](security.md) — threat model, what HMAC tokens do + and do not protect against. +- [`docs/context_firewall.md`](context_firewall.md) — redaction, + summarization, and budget details. +- [`docs/capabilities.md`](capabilities.md) — designing capabilities + for large tool ecosystems. +- [`docs/integrations.md`](integrations.md) — full MCP and HTTP driver + integration patterns. diff --git a/examples/tutorial.py b/examples/tutorial.py new file mode 100644 index 0000000..efd33b8 --- /dev/null +++ b/examples/tutorial.py @@ -0,0 +1,220 @@ +"""tutorial.py — "Secure your first MCP tool in 5 minutes" (offline edition). + +The full written walkthrough lives in ``docs/tutorial.md``. This script is the +runnable companion: it covers every step a new reader will see, using only +the in-process :class:`InMemoryDriver` so it has zero external dependencies +and runs in CI. + +What this demo proves end-to-end: + 1. Registering a capability with a sensitivity tag and ``allowed_fields``. + 2. Issuing a signed token for a principal that satisfies policy. + 3. Invoking the capability and observing the Frame in three response modes + (``summary`` / ``table`` / ``handle_only``) — PII never appears in any + of them. + 4. Expanding a Handle to retrieve filtered raw data on demand. + 5. A policy denial: the same token model rejects a writer call from a + reader principal, and the denial carries a stable ``reason_code``. + 6. Auditability: ``explain()`` returns the full :class:`ActionTrace`. + +Run with: ``python examples/tutorial.py`` +""" + +from __future__ import annotations + +import asyncio +import os + +os.environ.setdefault("AGENT_KERNEL_SECRET", "tutorial-secret-do-not-use-in-prod") + +from agent_kernel import ( + Capability, + CapabilityRegistry, + HMACTokenProvider, + InMemoryDriver, + Kernel, + Principal, + SafetyClass, + SensitivityTag, + StaticRouter, +) +from agent_kernel.drivers.base import ExecutionContext +from agent_kernel.errors import PolicyDenied +from agent_kernel.models import CapabilityRequest, ImplementationRef + +# A tiny, deterministic dataset that mixes safe and PII-bearing fields. +# Email is present on purpose: the firewall must keep it out of the LLM-safe +# Frame unless the capability declared it under ``allowed_fields``. +INVOICES: list[dict[str, object]] = [ + { + "id": "INV-001", + "customer_name": "Alice", + "email": "alice@example.com", + "amount": 120.0, + "status": "paid", + }, + { + "id": "INV-002", + "customer_name": "Bob", + "email": "bob@example.com", + "amount": 540.0, + "status": "unpaid", + }, + { + "id": "INV-003", + "customer_name": "Carol", + "email": "carol@example.com", + "amount": 75.0, + "status": "paid", + }, +] + + +def build_registry() -> CapabilityRegistry: + """Register one READ capability (PII-tagged) and one WRITE capability.""" + registry = CapabilityRegistry() + registry.register( + Capability( + capability_id="billing.invoices.list", + name="List Invoices", + description="List recent invoices", + safety_class=SafetyClass.READ, + sensitivity=SensitivityTag.PII, + # The Firewall will drop every column that isn't on this list. + allowed_fields=["id", "customer_name", "amount", "status"], + tags=["billing", "invoices", "list"], + impl=ImplementationRef(driver_id="memory", operation="list_invoices"), + ) + ) + registry.register( + Capability( + capability_id="billing.invoices.create", + name="Create Invoice", + description="Create a new invoice", + safety_class=SafetyClass.WRITE, + tags=["billing", "invoices", "create"], + impl=ImplementationRef(driver_id="memory", operation="create_invoice"), + ) + ) + return registry + + +def build_driver() -> InMemoryDriver: + """A driver that returns the synthetic invoices dataset on read.""" + driver = InMemoryDriver() + + def list_invoices(_ctx: ExecutionContext) -> list[dict[str, object]]: + return list(INVOICES) + + def create_invoice(_ctx: ExecutionContext) -> dict[str, object]: + return {"id": "INV-004", "status": "draft"} + + driver.register_handler("list_invoices", list_invoices) + driver.register_handler("create_invoice", create_invoice) + return driver + + +async def main() -> None: + print("=== Step 1: Register capabilities ===") + registry = build_registry() + for cap in registry.list_all(): + print(f" • {cap.capability_id} ({cap.safety_class.value})") + + print("\n=== Step 2: Wire the kernel ===") + router = StaticRouter( + routes={ + "billing.invoices.list": ["memory"], + "billing.invoices.create": ["memory"], + } + ) + kernel = Kernel( + registry=registry, + token_provider=HMACTokenProvider(secret="tutorial-secret-do-not-use-in-prod"), + router=router, + ) + kernel.register_driver(build_driver()) + + # PII-tagged capabilities require a ``tenant`` attribute on the principal. + reader = Principal(principal_id="alice", roles=["reader"], attributes={"tenant": "acme"}) + + print(f" principal: {reader.principal_id} roles={reader.roles}") + + print("\n=== Step 3: Grant a token ===") + list_req = CapabilityRequest( + capability_id="billing.invoices.list", goal="list recent invoices" + ) + token = kernel.get_token(list_req, reader, justification="") + print(f" token_id: {token.token_id}") + print(f" capability: {token.capability_id}") + print(f" expires_at: {token.expires_at.isoformat()}") + + print("\n=== Step 4: Invoke in summary mode ===") + frame = await kernel.invoke(token, principal=reader, args={"operation": "list_invoices"}) + print(f" mode: {frame.response_mode}") + print(" facts:") + for fact in frame.facts: + print(f" • {fact}") + if frame.handle: + print(f" handle: {frame.handle.handle_id}") + + print("\n=== Step 5: Invoke in table mode (allowed_fields enforced) ===") + table_token = kernel.get_token(list_req, reader, justification="") + table_frame = await kernel.invoke( + table_token, + principal=reader, + args={"operation": "list_invoices"}, + response_mode="table", + ) + print(f" mode: {table_frame.response_mode}") + print(f" rows shown: {len(table_frame.table_preview)}") + print(" preview:") + for row in table_frame.table_preview[:2]: + print(f" {row}") + leaked = [row for row in table_frame.table_preview if "email" in row] + assert leaked == [], ( + f"firewall regression: 'email' is not in allowed_fields but reached " + f"the table-mode Frame in {len(leaked)} row(s): {leaked}" + ) + print(f" PII fields leaked into Frame: {len(leaked)} (asserted == 0)") + + print("\n=== Step 6: Invoke in handle_only mode and expand ===") + handle_token = kernel.get_token(list_req, reader, justification="") + handle_frame = await kernel.invoke( + handle_token, + principal=reader, + args={"operation": "list_invoices"}, + response_mode="handle_only", + ) + assert handle_frame.handle is not None, "handle_only mode must return a Handle" + expanded = kernel.expand( + handle_frame.handle, + query={"offset": 0, "limit": 2, "fields": ["id", "amount"]}, + ) + print(f" expanded rows: {len(expanded.table_preview)}") + for row in expanded.table_preview: + print(f" {row}") + + print("\n=== Step 7: Watch policy enforcement deny a writer call ===") + create_req = CapabilityRequest( + capability_id="billing.invoices.create", goal="create an invoice" + ) + try: + kernel.get_token(create_req, reader, justification="reader trying a write — should fail") + except PolicyDenied as exc: + print(f" denied: {exc}") + print(f" reason_code: {exc.reason_code}") + else: # pragma: no cover - defensive + raise SystemExit("Expected PolicyDenied for reader on a WRITE capability") + + print("\n=== Step 8: Audit the read with explain() ===") + trace = kernel.explain(frame.action_id) + print(f" action_id: {trace.action_id}") + print(f" capability: {trace.capability_id}") + print(f" principal: {trace.principal_id}") + print(f" driver: {trace.driver_id}") + print(f" invoked_at: {trace.invoked_at.isoformat()}") + + print("\n✓ tutorial.py complete.") + + +if __name__ == "__main__": + asyncio.run(main())