diff --git a/README.md b/README.md index 5b7d78fc5..32f346ae2 100644 --- a/README.md +++ b/README.md @@ -65,3 +65,4 @@ distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. + diff --git a/rfcs/text/0054-llm-tool-fields.md b/rfcs/text/0054-llm-tool-fields.md new file mode 100644 index 000000000..f8fc23b22 --- /dev/null +++ b/rfcs/text/0054-llm-tool-fields.md @@ -0,0 +1,151 @@ +# 0054: LLM Tool Call Fields + +- Stage: **Proposal** +- Date: **TBD** +- Target maturity: **alpha** + +## Summary + +This RFC proposes a new top-level `llm_tool.*` field set to capture tool (function) calls made by Large Language Models during agentic workflows. As AI agents increasingly invoke external tools — APIs, code interpreters, retrieval systems, and MCP servers — there is no standardized way to observe, audit, and alert on these interactions in Elastic. This field set enables security teams to monitor tool usage, detect policy violations, and trace multi-step agent reasoning chains. + +## Usage + +Organizations deploying LLM-based agents (e.g., customer support bots, code assistants, autonomous workflows) need visibility into what tools agents invoke, what parameters they pass, whether execution succeeded, and whether safety guardrails were respected. + +**Security use case:** A detection rule alerts when `llm_tool.safety.classification == "blocked"` AND `llm_tool.result.status != "denied"` — indicating a guardrail disagreement where a tool was flagged but still executed. + +**Observability use case:** Dashboard panels show p95 `llm_tool.result.duration_ms` by `llm_tool.name`, revealing slow tool backends. Alerts fire when `llm_tool.chain.step` exceeds `llm_tool.chain.max_steps * 0.8` — an agent nearing its loop limit. + +**Compliance use case:** Audit logs filter on `llm_tool.approval.required == true AND llm_tool.approval.status == "auto_approved"` to find tool calls that bypassed human review in regulated environments. + +## Fields + +All proposed fields are defined in [`schemas/llm_tool.yml`](../../schemas/llm_tool.yml) and summarized in [`rfcs/text/0054/llm_tool.yml`](0054/llm_tool.yml). + +Key field groups: + +| Group | Fields | Purpose | +|-------|--------|---------| +| Identity | `llm_tool.id`, `llm_tool.name`, `llm_tool.type` | Identify which tool was called | +| Input | `llm_tool.parameters` (flattened) | Capture heterogeneous tool inputs | +| Output | `llm_tool.result.*` | Execution outcome, errors, timing | +| Safety | `llm_tool.safety.*` | Guardrail classifications and policy | +| Chain | `llm_tool.chain.*` | Position in multi-step reasoning | +| Approval | `llm_tool.approval.*` | Human-in-the-loop decisions | + +### Justification for `flattened` type on `llm_tool.parameters` + +Tool parameters are heterogeneous — each tool defines its own schema. A `get_weather` tool takes `{"location": "...", "days": 5}` while a `run_sql` tool takes `{"query": "SELECT ..."}`. Defining explicit leaf fields is not feasible because the parameter shapes are unbounded and tool-specific. The `flattened` type allows indexing leaf values for search while accommodating arbitrary schemas. + +## Source data + +### OpenAI Function Calling Response + +```json +{ + "@timestamp": "2026-04-30T10:15:30.000Z", + "gen_ai.operation.name": "chat", + "gen_ai.model.id": "gpt-4o", + "llm_tool.id": "call_abc123def456", + "llm_tool.name": "get_weather_forecast", + "llm_tool.type": "function", + "llm_tool.parameters": {"location": "San Francisco", "days": 5, "units": "celsius"}, + "llm_tool.result.status": "success", + "llm_tool.result.content": "Current temperature: 18°C, partly cloudy. 5-day high: 22°C.", + "llm_tool.result.duration_ms": 245, + "llm_tool.chain.id": "chain_session_001", + "llm_tool.chain.step": 2, + "llm_tool.safety.classification": "safe" +} +``` + +### Anthropic MCP Tool Use (Blocked) + +```json +{ + "@timestamp": "2026-04-30T10:20:00.000Z", + "gen_ai.operation.name": "chat", + "gen_ai.model.id": "claude-opus-4-6", + "llm_tool.id": "toolu_01A2B3C4D5", + "llm_tool.name": "filesystem_read", + "llm_tool.type": "mcp", + "llm_tool.description": "Read a file from the local filesystem", + "llm_tool.parameters": {"path": "/etc/shadow"}, + "llm_tool.result.status": "denied", + "llm_tool.result.error.message": "Access denied: path outside sandbox boundary", + "llm_tool.result.error.type": "permission_denied", + "llm_tool.result.duration_ms": 2, + "llm_tool.safety.classification": "blocked", + "llm_tool.safety.policy_id": "prod-tool-policy-v3", + "llm_tool.safety.reason": "Tool attempts to access filesystem outside sandbox.", + "llm_tool.chain.id": "chain_mcp_session_042", + "llm_tool.chain.step": 5, + "llm_tool.chain.max_steps": 10, + "llm_tool.approval.required": true, + "llm_tool.approval.status": "rejected", + "llm_tool.approval.reviewer": "security-team@example.com" +} +``` + +### Code Interpreter with Timeout + +```json +{ + "@timestamp": "2026-04-30T10:25:00.000Z", + "gen_ai.operation.name": "chat", + "gen_ai.model.id": "gpt-4o", + "llm_tool.id": "call_xyz789", + "llm_tool.name": "python_executor", + "llm_tool.type": "code_interpreter", + "llm_tool.parameters": {"code": "import time; time.sleep(300); print('done')"}, + "llm_tool.result.status": "timeout", + "llm_tool.result.error.message": "Execution exceeded 60s time limit", + "llm_tool.result.error.type": "timeout", + "llm_tool.result.duration_ms": 60000, + "llm_tool.result.tokens_used": 0, + "llm_tool.chain.id": "chain_code_007", + "llm_tool.chain.step": 1, + "llm_tool.chain.max_steps": 5, + "llm_tool.safety.classification": "suspicious", + "llm_tool.safety.reason": "Long-running code execution pattern detected" +} +``` + +## Scope of impact + +**Ingestion mechanisms:** Beats/Elastic Agent integrations for LLM providers (OpenAI, Anthropic, Azure OpenAI) and agent frameworks (LangChain, CrewAI, AutoGen) would emit these fields. Custom ingest pipelines parsing LLM API response logs would map tool_calls arrays to `llm_tool.*` events. + +**Usage mechanisms (Kibana/Security):** Detection rules can alert on safety violations, unauthorized tool use, and chain runaway scenarios. Dashboards can visualize tool call patterns, latency distributions, and approval workflows. The Security app could surface tool-call timelines alongside existing process and network events. + +**ECS project:** New field set YAML, generated artifacts, and field reference documentation. Alignment with OpenTelemetry GenAI semantic conventions for tool interactions (currently in development in OTel semconv). + +## Concerns + +**Concern 1: Overlap with gen_ai.* fields** +Resolution: `gen_ai.*` captures model-level request/response metadata (tokens, model ID, operation). `llm_tool.*` captures individual tool invocations within a single model interaction. They are complementary — a single `gen_ai` operation may produce multiple `llm_tool` events. The relationship is 1:N. + +**Concern 2: `flattened` type for parameters limits aggregation** +Resolution: `flattened` is intentional — tool parameters are unbounded and tool-specific. Users needing aggregation on specific parameter values can use runtime fields or ingest-time extraction to promoted fields. The alternative (nested object per tool) would explode mapping complexity. + +**Concern 3: High cardinality on llm_tool.name** +Resolution: Tool names are bounded by an organization's tool registry (typically 10-100 tools per agent system). This is comparable to `event.action` cardinality. Index template settings can apply `eager_global_ordinals` if needed. + +**Concern 4: OTel alignment unclear** +Resolution: OTel GenAI semconv is actively developing tool-call attributes (gen_ai.tool.call.id, gen_ai.tool.name). We align naming where possible and will add `otel:` mappings once the OTel spec stabilizes. The `alpha` maturity allows us to adjust before GA. + +## People + +* @bhapas | author +* TBD | subject matter expert (GenAI observability) +* TBD | security detection engineering + +## References + +- [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) +- [OpenAI Function Calling API](https://platform.openai.com/docs/guides/function-calling) +- [Anthropic Tool Use](https://docs.anthropic.com/en/docs/build-with-claude/tool-use) +- [Model Context Protocol Specification](https://modelcontextprotocol.io/) + +### RFC Pull Requests + +* Proposal: https://github.com/elastic/ecs/pull/TBD diff --git a/rfcs/text/0054/llm_tool.yml b/rfcs/text/0054/llm_tool.yml new file mode 100644 index 000000000..b80234bdf --- /dev/null +++ b/rfcs/text/0054/llm_tool.yml @@ -0,0 +1,96 @@ +--- +# Standalone YAML for RFC 0054 — LLM Tool Call fields +# This mirrors the schema in schemas/llm_tool.yml for reviewer reference. + +- name: llm_tool + title: LLM Tool + short: Fields describing LLM tool/function calls and their execution results + description: > + Fields that describe tool (function) calls made by Large Language Models + during agentic workflows. + type: group + fields: + - name: id + type: keyword + level: extended + alpha: true + - name: name + type: keyword + level: extended + alpha: true + - name: type + type: keyword + level: extended + alpha: true + allowed_values: [function, code_interpreter, retrieval, api, browser, mcp] + - name: description + type: text + level: extended + alpha: true + - name: parameters + type: flattened + level: extended + alpha: true + - name: result.content + type: text + level: extended + alpha: true + - name: result.status + type: keyword + level: extended + alpha: true + allowed_values: [success, error, timeout, denied, rate_limited] + - name: result.error.message + type: keyword + level: extended + alpha: true + - name: result.error.type + type: keyword + level: extended + alpha: true + - name: result.duration_ms + type: long + level: extended + alpha: true + - name: result.tokens_used + type: long + level: extended + alpha: true + - name: safety.classification + type: keyword + level: extended + alpha: true + allowed_values: [safe, suspicious, blocked, unclassified] + - name: safety.policy_id + type: keyword + level: extended + alpha: true + - name: safety.reason + type: keyword + level: extended + alpha: true + - name: chain.id + type: keyword + level: extended + alpha: true + - name: chain.step + type: integer + level: extended + alpha: true + - name: chain.max_steps + type: integer + level: extended + alpha: true + - name: approval.required + type: boolean + level: extended + alpha: true + - name: approval.status + type: keyword + level: extended + alpha: true + allowed_values: [approved, rejected, pending, auto_approved] + - name: approval.reviewer + type: keyword + level: extended + alpha: true diff --git a/schemas/llm_tool.yml b/schemas/llm_tool.yml new file mode 100644 index 000000000..042a0e13d --- /dev/null +++ b/schemas/llm_tool.yml @@ -0,0 +1,218 @@ +# Licensed to Elasticsearch B.V. under one or more contributor +# license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright +# ownership. Elasticsearch B.V. licenses this file to you under +# the Apache License, Version 2.0 (the "License"); you may +# not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +--- +- name: llm_tool + title: LLM Tool + group: 2 + short: Fields describing LLM tool/function calls and their execution results + description: > + Fields that describe tool (function) calls made by Large Language Models + during agentic workflows. Captures the tool invocation, parameters, + execution outcome, and safety classification. + + These fields support observability of AI agent systems where models + invoke external tools (APIs, databases, code interpreters, etc.) as part + of multi-step reasoning chains. + type: group + fields: + - name: id + type: keyword + description: Unique identifier for this tool call, typically assigned by the LLM provider. + example: call_abc123def456 + level: extended + alpha: This field is alpha and subject to change. + + - name: name + type: keyword + description: The name of the tool or function being invoked. + example: get_weather_forecast + level: extended + alpha: This field is alpha and subject to change. + + - name: type + type: keyword + description: The category of tool being invoked. + example: function + level: extended + alpha: This field is alpha and subject to change. + allowed_values: + - name: function + description: A named function with structured parameters. + - name: code_interpreter + description: An inline code execution environment. + - name: retrieval + description: A vector/document retrieval system. + - name: api + description: An external HTTP API call. + - name: browser + description: A web browsing or scraping tool. + - name: mcp + description: A Model Context Protocol server tool. + + - name: description + type: text + description: Human-readable description of the tool's purpose, as provided in the tool schema. + example: Retrieves the weather forecast for a given location and date range. + level: extended + alpha: This field is alpha and subject to change. + + - name: parameters + type: flattened + short: Input parameters passed to the tool call as a JSON object. + description: > + The input parameters passed to the tool call, as a JSON object. + Uses flattened type because tool parameters are heterogeneous and + schema-less across different tools — each tool defines its own + parameter shape. Leaf values are indexable for search. + example: '{"location": "San Francisco", "days": 5, "units": "celsius"}' + level: extended + alpha: This field is alpha and subject to change. + + - name: result.content + type: text + description: The textual content returned by the tool execution. + example: "Current temperature in San Francisco: 18°C, partly cloudy." + level: extended + alpha: This field is alpha and subject to change. + + - name: result.status + type: keyword + description: The execution outcome status of the tool call. + example: success + level: extended + alpha: This field is alpha and subject to change. + allowed_values: + - name: success + description: Tool executed successfully and returned a result. + - name: error + description: Tool execution failed with an error. + - name: timeout + description: Tool execution exceeded its time limit. + - name: denied + description: Tool execution was blocked by a safety or policy check. + - name: rate_limited + description: Tool execution was rejected due to rate limiting. + + - name: result.error.message + type: keyword + description: Error message returned when tool execution fails. + example: "API rate limit exceeded. Retry after 30 seconds." + level: extended + alpha: This field is alpha and subject to change. + + - name: result.error.type + type: keyword + description: Classification of the error encountered during tool execution. + example: rate_limit + level: extended + alpha: This field is alpha and subject to change. + + - name: result.duration_ms + type: long + description: Duration of the tool execution in milliseconds. + example: 245 + level: extended + alpha: This field is alpha and subject to change. + + - name: result.tokens_used + type: long + description: Number of tokens consumed by the tool result content when fed back to the model. + example: 128 + level: extended + alpha: This field is alpha and subject to change. + + - name: safety.classification + type: keyword + description: Safety classification assigned to this tool call by a guardrail or policy engine. + example: safe + level: extended + alpha: This field is alpha and subject to change. + allowed_values: + - name: safe + description: Tool call passed all safety checks. + - name: suspicious + description: Tool call flagged for review but allowed to proceed. + - name: blocked + description: Tool call was blocked by safety policy. + - name: unclassified + description: No safety classification was performed. + + - name: safety.policy_id + type: keyword + description: Identifier of the safety policy or guardrail that evaluated this tool call. + example: prod-tool-policy-v3 + level: extended + alpha: This field is alpha and subject to change. + + - name: safety.reason + type: keyword + description: Human-readable reason for the safety classification decision. + example: "Tool attempts to access filesystem outside sandbox." + level: extended + alpha: This field is alpha and subject to change. + + - name: chain.id + type: keyword + description: Identifier for the reasoning chain or agent loop this tool call belongs to. + example: chain_789xyz + level: extended + alpha: This field is alpha and subject to change. + + - name: chain.step + type: integer + description: The sequential step number within the reasoning chain (1-based). + example: 3 + level: extended + alpha: This field is alpha and subject to change. + + - name: chain.max_steps + type: integer + description: Maximum number of tool-call steps allowed in this chain before forced termination. + example: 10 + level: extended + alpha: This field is alpha and subject to change. + + - name: approval.required + type: boolean + description: Whether this tool call required human approval before execution. + example: true + level: extended + alpha: This field is alpha and subject to change. + + - name: approval.status + type: keyword + description: The human approval decision for this tool call. + example: approved + level: extended + alpha: This field is alpha and subject to change. + allowed_values: + - name: approved + description: Human approved the tool call. + - name: rejected + description: Human rejected the tool call. + - name: pending + description: Awaiting human decision. + - name: auto_approved + description: Automatically approved by policy (no human in loop). + + - name: approval.reviewer + type: keyword + description: Identity of the human who approved or rejected the tool call. + example: ops-team@example.com + level: extended + alpha: This field is alpha and subject to change.