Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,76 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- Capability namespaces and hierarchical discovery in `CapabilityRegistry`:
dot-notation `capability_id`s now expose `list_namespaces()` /
`list_namespace(prefix)` operations; `register_namespace(prefix, loader=...)`
enables deferred registration for large tool ecosystems (the loader runs
at most once on first access). `search()` gained an `offset` kwarg for
pagination, strips a small stop-word set, and now scores with a
BM25-flavoured ranker that weights `capability_id`/`tags` matches above
`description`. Flat (un-namespaced) capability IDs continue to work
unchanged. (#45)
- Capability marketplace, part 1 — manifest format & local registry: new
`CapabilityDescriptor` and `CapabilityManifest` dataclasses (both
JSON-round-trippable via `to_dict`/`from_dict`), new
`agent_kernel.federation` module with `build_manifest()`,
`import_manifest()`, and `merge_sensitivity()`, and new `Kernel.advertise()`
/ `Kernel.import_remote()` methods. `Kernel` gained a `kernel_id`
argument used as the manifest publisher identity. Three trust policies
are honoured at import time (`most_restrictive` (default), `local_only`,
`remote_deferred`); imported capabilities are routed through a
caller-supplied driver and flow through the full local policy → token →
firewall pipeline. HMAC tokens remain kernel-scoped — a token issued by
one kernel cannot be verified by another with a different secret. New
errors `NamespaceNotFound`, `FederationError`, `ManifestError`,
`TrustPolicyError`. (#52)
- New docs: [`docs/federation.md`](docs/federation.md) for the marketplace
protocol and a namespace section in
[`docs/capabilities.md`](docs/capabilities.md).
- Capability marketplace, part 2 — federated discovery: new
`agent_kernel.federation_discovery` module with `discover_peers()`,
`sign_manifest()`, `verify_manifest()`, `serve_manifest_payload()`, and
`DiscoveryRateLimiter`. `Kernel.discover_peers()` fetches one or more
manifests over HTTP from peer URLs or a registry URL. Signed envelopes
(HMAC-SHA256) detect tampering and let importers refuse unsigned
manifests when a verification secret is in play (and vice versa). New
errors `ManifestSignatureError` and `DiscoveryError`. (#51, closes #49)
- OpenTelemetry integration: new `agent_kernel.otel` module with
`instrument_kernel(kernel)` that wraps `Kernel.invoke` and
`Kernel.grant_capability` with OTel spans + metrics (invocation count,
latency histogram, denial counter). No-op when the optional `[otel]`
extra is not installed (`OTEL_AVAILABLE` reports the runtime status).
Idempotent — repeat calls on the same kernel are no-ops. (#38)
- Streaming firewall: new `Firewall.apply_stream()` async-iterator method
that processes driver chunks one-at-a-time, applying PII redaction
per-chunk. New `StreamingDriver` Protocol in `drivers/base.py` extends
`Driver` with an optional `execute_stream()`. New `Kernel.invoke_stream()`
yields `Frame` chunks; the last chunk carries `is_final=True`. Drivers
without `execute_stream` automatically fall back to a single-chunk stream
via `execute()`. `Frame` gained an `is_final: bool` field. (#47)

### Changed
- Tech debt: `policy_dsl.py` decomposed (was 661 lines). Parsing and
schema dataclasses now live in `policy_dsl_parser.py`
(`PolicyMatch`, `PolicyRule`, `parse_engine_data`, `parse_rule`,
YAML/TOML loaders), and the denial-explanation traversal in
`policy_dsl_explain.py`. The public import surface
(`DeclarativePolicyEngine`, `PolicyMatch`, `PolicyRule`) is unchanged.
`RateLimiter` and rate-limit constants extracted from `policy.py` into
a new `rate_limit.py` module; `policy.py` continues to re-export them
under their original names. (#68)
- Tech debt: `kernel.py` split into the `agent_kernel.kernel` sub-package
to honour AGENTS.md's ≤ 300-line module bar. The `Kernel` class lives
in `kernel/__init__.py`; heavy methods (invoke pipeline, dry-run,
federation, streaming) delegate to sibling modules. Existing
`from agent_kernel import Kernel` / `from agent_kernel.kernel import Kernel`
imports are unchanged. (#68)

### Tests
- Added explicit dry-run regression tests for `HTTPDriver` and `MCPDriver`,
pinning the kernel's driver-agnostic dry-run short-circuit. (#68)

## [0.7.0] - 2026-05-20

### Added
Expand Down
51 changes: 51 additions & 0 deletions docs/capabilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,60 @@
## Naming conventions

- Use `domain.verb_noun` format: `billing.list_invoices`, `users.get_profile`.
- Prefer fully namespaced IDs (`billing.invoices.list`) over flat ones —
the registry will infer namespace operations from the dot-segments and
large ecosystems benefit from being able to list/search per namespace.
- Be specific: prefer `billing.cancel_invoice` over `billing.update`.
- Avoid generic names like `billing.execute` or `api.call`.

## Namespaces and discovery

`CapabilityRegistry` recognises dot-notation namespaces automatically. No
extra registration step is required — `register(Capability(capability_id=
"billing.invoices.list", ...))` is enough to populate the `billing` and
`billing.invoices` namespaces.

```python
registry.list_namespaces()
# ['billing', 'crm']

registry.list_namespace("billing")
# [Capability('billing.invoices.list'), Capability('billing.payments.refund'), …]
```

For large tool ecosystems where eagerly registering hundreds of
capabilities is wasteful, declare a deferred loader. The loader runs at
most once, the first time the namespace is searched, listed, or any
capability under it is fetched via `get()`:

```python
def load_billing() -> list[Capability]:
return [
Capability(capability_id="billing.invoices.list", …),
Capability(capability_id="billing.invoices.create", …),
Capability(capability_id="billing.payments.refund", …),
]

registry.register_namespace(
"billing",
description="Billing and invoicing tools",
loader=load_billing,
)
```

Search ranks matches with a BM25-flavoured scorer that weights
`capability_id` and `tags` higher than `description`, strips a small
stop-word set (`a`, `the`, `please`, …), and offers `offset` for
pagination:

```python
results = registry.search("list invoices", max_results=10, offset=0)
```

Search is deterministic — equal-scoring capabilities are returned in
`capability_id` order — and trips any deferred namespace loader whose
prefix shares a token with the query.

## Granularity

Each capability should map to a single, auditable action with clear side-effects.
Expand Down
56 changes: 56 additions & 0 deletions docs/context_firewall.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,3 +118,59 @@ manager = BudgetManager(total_budget=128_000, token_counter=tiktoken_counter)

The default counter (`default_token_counter`) is a character-based
`len(json.dumps(value)) // 4` approximation with no extra dependencies.

## Streaming

For large results that arrive incrementally (e.g. SSE-style HTTP responses,
chunked database cursors, line-by-line tool output), `Firewall.apply_stream()`
lets you process chunks one at a time. PII redaction and per-chunk budget
caps apply on every yielded Frame — secrets cannot leak just because they
arrived in chunk N rather than the final aggregate.

```python
from agent_kernel.drivers.base import ExecutionContext, StreamingDriver

class MyStreamingDriver:
driver_id = "stream"

async def execute(self, ctx: ExecutionContext):
# one-shot fallback, called when StreamingDriver isn't used.
...

async def execute_stream(self, ctx: ExecutionContext):
async for row in some_async_cursor(ctx):
yield {"row": row}
yield {"__is_final__": True} # explicit sentinel (optional)


# isinstance(driver, StreamingDriver) is runtime-checkable.
assert isinstance(MyStreamingDriver(), StreamingDriver)

async for frame in kernel.invoke_stream(token, principal=p, args={}):
handle_chunk(frame)
if frame.is_final:
break
```

When the resolved driver does **not** implement `StreamingDriver`,
`Kernel.invoke_stream` falls back to a single `Driver.execute()` call and
yields exactly one `Frame` with `is_final=True`. Each invocation produces
one `ActionTrace` covering the whole stream.

## Observability

`agent_kernel.instrument_kernel(kernel)` installs OpenTelemetry spans and
metric emission on `Kernel.invoke` and `Kernel.grant_capability`:

```python
from agent_kernel import Kernel, instrument_kernel, OTEL_AVAILABLE

kernel = Kernel(registry=...)
if OTEL_AVAILABLE:
instrument_kernel(kernel) # no-op when [otel] extra not installed
```

Spans: `agent_kernel.invoke`, `agent_kernel.grant`. Metrics:
`agent_kernel.invocations` (counter), `agent_kernel.invocation_duration`
(histogram, ms), `agent_kernel.policy_denials` (counter). The call is
idempotent — repeat invocations on the same kernel are no-ops.
Loading