diff --git a/.mcp.json b/.mcp.json index b03271a..9f5a0a8 100644 --- a/.mcp.json +++ b/.mcp.json @@ -3,6 +3,13 @@ "codebase-memory-mcp": { "command": "codebase-memory-mcp", "args": [] + }, + "puppeteer": { + "command": "npx", + "args": [ + "-y", + "@modelcontextprotocol/server-puppeteer@2025.5.12" + ] } } } diff --git a/apps/api/README.md b/apps/api/README.md index 990ecf7..eeba2f9 100644 --- a/apps/api/README.md +++ b/apps/api/README.md @@ -4,9 +4,9 @@ Gatekeeper is a control layer that sits between your LLM agent, your security ru --- -## The architecture +### The architecture -When an agent requests a tool execution, the request flows through these components before reaching the target MCP server: +When an agent requests a tool execution (either a single tool or multiple parallel tools), the request flows through these components before reaching the target MCP server: ```text Client request @@ -15,16 +15,20 @@ Client request Express API (/agent/run) │ ├── memory.ts (tracks chat history and active approval IDs as read-only data) - ├── llm.ts (handles system prompts and validates tool input schemas) + ├── llm.ts (handles system prompts, routes to fallback models, and validates input schemas) ▼ Orchestration loop (loop.ts) │ + ├── Parses either single 'tool_call' or parallel 'tool_calls' ▼ Policy engine (rules/*.ts) ──► Decision engine (decision.ts) │ + ├── Intercepts parallel steps as 'multiple_tool_calls' + ├── Checks each individual tool policy ▼ MCP executor (bootstrap.ts) │ + ├── Runs allowed tools in parallel (Promise.all) ▼ MCP registry ──► External MCP servers ``` @@ -32,8 +36,8 @@ Policy engine (rules/*.ts) ──► Decision engine (decision.ts) ### Key files and their jobs * **[memory.ts](file:///home/yb175/projects/gate-keeper/apps/api/src/agent/memory.ts)**: Tracks the active chat history, tool execution results, and approval identifiers. It exposes these collections as read-only arrays to protect against accidental session corruption. -* **[llm.ts](file:///home/yb175/projects/gate-keeper/apps/api/src/agent/llm.ts)**: Validates input schemas, builds system prompts, and handles connection details with the language model. -* **[loop.ts](file:///home/yb175/projects/gate-keeper/apps/api/src/agent/loop.ts)**: Runs the main orchestration loop. It manages token budgets, keeps track of tool approvals, and enforces a hard limit of 30 steps to stop runaway processes. +* **[llm.ts](file:///home/yb175/projects/gate-keeper/apps/api/src/agent/llm.ts)**: Validates input schemas, builds system prompts, and handles connection details with the language model. Automatically handles fallback routing (Gemini primary -> Grok/Groq fallback) and request timeouts. +* **[loop.ts](file:///home/yb175/projects/gate-keeper/apps/api/src/agent/loop.ts)**: Runs the main orchestration loop. It manages token budgets, handles parallel tool execution triggers using `Promise.all`, checks approvals, and enforces a hard limit of 30 steps. --- @@ -48,6 +52,9 @@ Policy evaluation Check if blocked (isBlocked) ────[Blocked]────► Deny │ ▼ [Allowed] +Check if path is within sandbox (withinSandboxPath) ────[Escaped]────► Deny + │ + ▼ [Safe] Check if budget exceeded (budgetExceeded) ────[Exceeded]────► Deny │ ▼ [Under Budget] @@ -65,71 +72,277 @@ Check if approval required (needsApproval) --- -## How decisions and approvals work +## Sandbox path enforcement + +Each policy record has an optional `sandbox_path` field. When set, the **path rule** ([`pathRule.ts`](file:///home/yb175/projects/gate-keeper/apps/api/src/policy/rules/pathRule.ts)) validates every string-valued argument in the tool call before it reaches the MCP server. -The decision engine connects static checks with dynamic approval states stored in the database: +### How it works ```text -Policy result +Tool call arguments + │ + ▼ +For each string argument: + Resolve path relative to sandbox_path root + │ + ├── Syntactic traversal check (path.relative starts with "..") + │ └── Deny immediately + │ + ├── Absolute path that escapes root (path.isAbsolute) + │ └── Deny immediately + │ + └── Symlink traversal check (getRealAncestor check) + └── Deny if real ancestor lands outside sandbox root │ ▼ -Decision engine - ├───[Static Denied]──────────────────────────────────────────────► Return DENY - └───[Requires Review] ──► Check if approvalId exists in request - │ - ├───[No]──► Create approval row ──► Return PENDING & approvalId - │ - └───[Yes]──► Fetch approval from database - │ - ▼ - Is status APPROVED? - ├───[Yes]──► Return ALLOW & delete approval row - ├───[No / Pending]──► Return PENDING - └───[Rejected]──► Return DENY +All arguments safe → proceed to budget check ``` -### The approval lifecycle +### Edge cases handled -When a policy flags a tool execution, the engine logs the parameters to the database as `PENDING`, pauses the execution loop, and returns a unique `approvalId` to your application. +| Scenario | Behaviour | +|---|---| +| No `sandbox_path` on policy | Rule is skipped — no restriction | +| Tool has no string arguments (e.g. `list_files`) | Rule is skipped — nothing to check | +| Path prefixed with the sandbox directory name (`sandbox/file.txt`) | Prefix is stripped before resolving, same as the file-manager-mcp itself | +| Relative traversal (`../../etc/passwd`) | Caught by `path.relative` starting with `..` | +| Absolute path outside root (`/etc/passwd`) | Caught by `path.isAbsolute(relative)` | +| Symlink inside sandbox pointing outside | Caught by resolving the real ancestor and re-checking | +| Sandbox root is itself a symlink | Root is canonicalised with `fs.realpathSync` before all checks | +| Empty string argument | Denied with a descriptive error | +| Database error | Fail-closed: `success:false` → engine returns `DENY` | +| Multiple path arguments (e.g. `move_file` with `source` + `destination`) | Every string argument is checked independently | -```text -Client Application loop.ts SQLite DB Admin Dashboard - │ │ │ │ - 1 │── Prompt / Resume ──────►│ │ │ - │ │── Run policy check │ │ - │ │ │ │ - │ │── [If needs review] ───►│ │ - 2 │ │ Create PENDING │ │ - │ │◄── Return approvalId ───│ │ - 3 │◄─ Return PENDING ────────│ │ │ - │ │ │ │ - │ [Execution Suspended] │ │ │ - 4 │ │ │◄── Approve/Reject ──│ - │ │ │ │ - 5 │── Resume execution ─────►│ │ │ - 6 │ (with approvalId) │── Query approval status ─────►│ │ - │ │◄─ Return status ────────│ │ - │ │ │ │ - │ │── [If APPROVED] │ │ - 7 │ │ Delete approval row ──►│ │ - │ │ Execute MCP tool │ │ - 8 │◄─ Return results ────────│ │ │ - │ │ │ │ - │ │── [If PENDING/REJECTED] │ │ - 9 │◄─ Return PENDING/DENY ───│ │ │ -``` +### Configuring a sandbox path + +Use the `PATCH /policies/:toolName` endpoint to update an existing policy, optionally configuring its `sandbox_path`: + + ```bash + curl -X PATCH http://localhost:3001/policies/write_file \ + -H 'Content-Type: application/json' \ + -d '{ "action": "ALLOW", "sandbox_path": "/home/user/sandbox" }' + ``` + + The file-manager-mcp already enforces its own sandbox internally — the policy-level path rule provides an additional defence-in-depth layer controlled centrally by the admin dashboard. + + --- + + ## How decisions and approvals work + + The decision engine connects static checks with dynamic approval states stored in the database. + + ### Parallel Tool Call Batching + When the agent loop generates multiple parallel tool calls (a `tool_calls` step), the decision engine intercepts the request under a virtual composite tool name `"multiple_tool_calls"`. + - The engine runs individual `PolicyEngine` evaluations on every tool in the parallel list. + - If **any** tool is blocked (`DENY`), the entire parallel step is immediately denied. + - If **any** tool requires approval (and none are blocked), a single `multiple_tool_calls` approval request is logged in the database, batching all tool calls together so the user can approve or reject the entire step as a single action. + + ```text + Policy result + │ + ▼ + Decision engine + ├───[Static Denied (Single or Parallel)] ────────────────────────► Return DENY + └───[Requires Review] ──► Check if approvalId exists in request + │ + ├───[No]──► Create approval row ──► Return PENDING & approvalId + │ (Batches parallel tools under 'multiple_tool_calls') + │ + └───[Yes]──► Fetch approval from database + │ + ▼ + Is status APPROVED? + ├───[Yes]──► Return ALLOW & delete approval row + ├───[No / Pending]──► Return PENDING + └───[Rejected]──► Return DENY + ``` + + ### The approval lifecycle + + When a policy flags a tool execution, the engine logs the parameters to the database as `PENDING`, pauses the execution loop, and returns a unique `approvalId` to your application. + + ```text + Client Application loop.ts SQLite DB Admin Dashboard + │ │ │ │ + 1 │── Prompt / Resume ──────►│ │ │ + │ │── Run policy check │ │ + │ │ │ │ + │ │── [If needs review] ───►│ │ + 2 │ │ Create PENDING │ │ + │ │◄── Return approvalId ───│ │ + 3 │◄─ Return PENDING ────────│ │ │ + │ │ │ │ + │ [Execution Suspended] │ │ │ + 4 │ │ │◄── Approve/Reject ──│ + │ │ │ │ + │ [Real-time Polling] │ │ │ + 5 │── Poller detects APPROVED│ │ │ + 6 │── Resume execution ─────►│ │ │ + │ (with approvalId) │── Query approval status ─────►│ │ + │ │◄─ Return status ────────│ │ + │ │ │ │ + │ │── [If APPROVED] │ │ + 7 │ │ Delete approval row ──►│ │ + │ │ Execute MCP tool(s) │ │ + 8 │◄─ Return results ────────│ │ │ + ``` + + ### Safety & Concurrency protections + + * **Strict status checks**: The orchestrator checks the approval record status before resuming. It only runs the tool if the status is explicitly `APPROVED`. A `PENDING` status prompts the client to poll again, and a `REJECTED` status cancels the request. + * **Single-use approvals**: We look up approvals by their unique identifier (`approvalId`) rather than the tool name. This binds each approval to a specific tool call, preventing replay attacks where a previously approved tool runs again without authorization. + * **Idempotency and Delete Protection**: + - The manual approval and rejection endpoints (`/policies/approvals/:id/approve` and `/policies/approvals/:id/reject`) are fully idempotent. Re-submitting an already approved or rejected request returns success (`200`) instead of failing. + - To prevent database exceptions when the client's automated real-time polling detects an approval status transition and resumes execution at the exact same split-second that the user manually clicks "Resume Execution", all `db.approval.delete()` operations are wrapped in safe catch blocks. If a concurrent thread has already deleted the single-use record, the request ignores the missing record error and continues execution. + + --- + + ## How we protect API boundaries + + * **Server-side token tracking**: The backend calculates and tracks token budgets in the database. You cannot bypass limits by altering client payloads. + * **Automatic budget window resets**: Token budgets are tracked per conversation. If a 3-minute inactivity window is exceeded during sequential agent execution, the conversation's accumulated token count automatically resets. + * **Message history sanitization**: The system strips out any messages with the `"system"` role from incoming history payloads, preventing clients from injecting override prompts. + * **Timeout limits on model requests**: We wrap connections to the model API in an `AbortSignal.timeout(timeoutMs)`. If the upstream service freezes or runs slow, the connection terminates cleanly instead of stalling your server thread. The timeout duration is safely parsed and falls back to 30 seconds if config variables are invalid. + + --- + + ## Decision Logging & Auditing + + Every evaluation made by the policy decision engine writes a detailed audit entry to the SQLite database. This ensures developers and administrators have a clear, immutable record of what the agent tried to do and why it was allowed or blocked. + + ### Log Database Schema (`Log` Model) + ```prisma + model Log { + id String @id @default(uuid()) + tool_name String + decision Decision // ALLOW | DENY | PENDING | FAILED + reason String? + createdAt DateTime @default(now()) + } + ``` + + ### When Logs are Written + 1. **`ALLOW`**: + - Written immediately when a tool execution is approved naturally by policy. + - Written when the orchestrator resumes and executes a tool call that was manually `APPROVED` by an administrator. + - For parallel tool executions (`multiple_tool_calls`), an `ALLOW` log is written for each constituent tool run. + 2. **`PENDING`**: + - Written when a tool execution (single or parallel) requires manual administrator review, capturing the unique `approvalId`. + 3. **`DENY`**: + - Written when a tool is blocked by policy configuration. + - Written when an administrator rejects a pending approval request. + - Written when a critical failure occurs inside the decision engine (logged as `Decision engine failure`). + + ### Audit Management + Administrators can inspect logs in real-time on the **Decision Logs** tab of the dashboard and reset/clear all logs via a single action (`DELETE /logs`), which truncates the log table for clean developer iteration. + + --- + + ## REST API Reference + + The Express backend serves the following REST endpoints: + + ### 🤖 Agent Orchestration + #### `POST /agent/run` + Runs the main LLM orchestration loop for a conversation. + * **Payload**: + ```json + { + "message": "Create a file named hello.txt", + "conversationId": "conv-uuid-123", + "approvalId": "approval-uuid-abc" // Optional. Pass to resume a paused execution. + } + ``` + * **Response (SUCCESS)**: + ```json + { + "status": "SUCCESS", + "answer": "File hello.txt successfully created." + } + ``` + * **Response (PENDING)**: + ```json + { + "status": "PENDING", + "approvalId": "approval-uuid-abc" + } + ``` + * **Response (DENY)**: + ```json + { + "status": "DENY", + "reason": "Tool execution blocked: write_file - path not allowed." + } + ``` + +--- + +### 🛡️ Policy Configurations +#### `GET /policies` +Returns a list of all configured policy actions. +* **Response**: + ```json + [ + { "tool_name": "read_file", "action": "ALLOW" }, + { "tool_name": "write_file", "action": "APPROVAL" } + ] + ``` + +#### `GET /policies/:toolName` +Retrieves the policy for a specific tool. If no rule exists, it defaults to a fail-closed `APPROVAL` response. +* **Response**: + ```json + { + "tool_name": "read_file", + "action": "ALLOW" + } + ``` + +#### `POST /policies` +Creates a new policy rule. +* **Payload**: + ```json + { + "tool_name": "delete_file", + "action": "DENY" + } + ``` +* **Response**: `201 Created` with the newly created policy object. -### Safety protections +#### `PATCH /policies/:toolName` +Updates an existing policy action. +* **Payload**: + ```json + { + "action": "ALLOW" + } + ``` -* **Strict status checks**: The orchestrator checks the approval record status before resuming. It only runs the tool if the status is explicitly `APPROVED`. A `PENDING` status prompts the client to poll again, and a `REJECTED` status cancels the request. -* **Single-use approvals**: We look up approvals by their unique identifier (`approvalId`) rather than the tool name. This binds each approval to a specific tool call, preventing replay attacks where a previously approved tool runs again without authorization. +#### `DELETE /policies/:toolName` +Deletes a policy configuration, reverting it to the default fail-closed behavior. --- -## How we protect API boundaries +### 📥 Manual Approvals +#### `GET /approvals` +Retrieves a list of all manual approval records ordered by creation date (newest first). -* **Server-side token tracking**: The backend calculates and tracks token budgets in the database. You cannot bypass limits by altering client payloads. -* **Message history sanitization**: The system strips out any messages with the `"system"` role from incoming history payloads, preventing clients from injecting override prompts. -* **Timeout limits on model requests**: We wrap connections to the model API in an `AbortSignal.timeout(timeoutMs)`. If the upstream service freezes or runs slow, the connection terminates cleanly instead of stalling your server thread. The timeout duration is safely parsed and falls back to 30 seconds if config variables are invalid. +#### `POST /policies/approvals/:id/approve` +Approves a pending request. +* **Response**: `{ "id": "uuid", "status": "APPROVED" }`. +* **Idempotency**: Returns `200` with the approved state if the request is already approved. + +#### `POST /policies/approvals/:id/reject` +Rejects a pending request. +* **Response**: `{ "id": "uuid", "status": "REJECTED" }`. +* **Idempotency**: Returns `200` with the rejected state if the request is already rejected. --- + +### 📜 Audit Logs +#### `GET /logs` +Retrieves all recorded policy decisions. + +#### `DELETE /logs` +Clears/resets all decision log records from the database. Returns `204 No Content`. + diff --git a/apps/api/mcp/bootstrap.ts b/apps/api/mcp/bootstrap.ts index 69886d5..915baea 100644 --- a/apps/api/mcp/bootstrap.ts +++ b/apps/api/mcp/bootstrap.ts @@ -1,14 +1,17 @@ import { PluginRegistry } from "./registry.js"; import { ToolsDiscovery } from "./discovery.js"; import { ToolExecutor } from "./execute.js"; +import { fileURLToPath } from "url"; import { fileManagerPlugin } from "./plugins/filemanager/manifest.js"; import { context7Plugin } from "./plugins/context7/manifest.js"; +import { puppeteerPlugin } from "./plugins/puppeteer/manifest.js"; export const mcpRegistry = new PluginRegistry(); // Register plugins from modular layout manifests mcpRegistry.registerPlugin(fileManagerPlugin); mcpRegistry.registerPlugin(context7Plugin); +mcpRegistry.registerPlugin(puppeteerPlugin); export const mcpDiscovery = new ToolsDiscovery(mcpRegistry); export const mcpExecutor = new ToolExecutor(mcpDiscovery); diff --git a/apps/api/mcp/plugins/context7/manifest.ts b/apps/api/mcp/plugins/context7/manifest.ts index 3807966..27e172b 100644 --- a/apps/api/mcp/plugins/context7/manifest.ts +++ b/apps/api/mcp/plugins/context7/manifest.ts @@ -1,4 +1,4 @@ -import "../../../../src/utils/env.js"; +import "../../../src/utils/env.js"; import { StdioMCPServer } from "../../stdio-server.js"; const context7Env: Record = {}; diff --git a/apps/api/mcp/plugins/puppeteer/manifest.ts b/apps/api/mcp/plugins/puppeteer/manifest.ts new file mode 100644 index 0000000..c751de8 --- /dev/null +++ b/apps/api/mcp/plugins/puppeteer/manifest.ts @@ -0,0 +1,9 @@ +import "../../../src/utils/env.js"; +import { StdioMCPServer } from "../../stdio-server.js"; + +export const puppeteerPlugin = new StdioMCPServer( + "puppeteer", + "npx", + ["-y", "@modelcontextprotocol/server-puppeteer"], + {} +); diff --git a/apps/api/package.json b/apps/api/package.json index e05c4d5..ca34475 100644 --- a/apps/api/package.json +++ b/apps/api/package.json @@ -14,6 +14,7 @@ }, "dependencies": { "@modelcontextprotocol/sdk": "^1.4.0", + "@modelcontextprotocol/server-puppeteer": "^2025.5.12", "@repo/db": "*", "@repo/shared": "*", "cors": "^2.8.5", diff --git a/apps/api/src/agent/agent.test.ts b/apps/api/src/agent/agent.test.ts index 823bfe1..58745fc 100644 --- a/apps/api/src/agent/agent.test.ts +++ b/apps/api/src/agent/agent.test.ts @@ -1,7 +1,7 @@ import { vi, describe, it, expect, beforeEach, afterEach } from "vitest"; import { runAgent } from "./loop.js"; import { createMemory } from "./memory.js"; -import { llmClient } from "./llm.js"; +import { llmClient, nextStep } from "./llm.js"; // Mock @repo/db vi.mock("@repo/db", () => { @@ -425,6 +425,91 @@ describe("Agent Module & Execution Loop", () => { expect(result.reason).toBe("Approval not approved"); }); + // 14) Parallel Tool Execution tests + describe("Parallel Tool Execution", () => { + it("should parse type: tool_calls from LLM and validate schemas", async () => { + vi.spyOn(llmClient, "callModel").mockResolvedValue( + JSON.stringify({ + type: "tool_calls", + tool_calls: [ + { tool_name: "test_tool", arguments: { arg1: "val1" } }, + { tool_name: "test_tool", arguments: { arg1: "val2" } } + ] + }) + ); + + const memory = createMemory(); + const tools = [ + { + name: "test_tool", + description: "A test tool", + inputSchema: { type: "object", properties: { arg1: { type: "string" } } }, + execute: vi.fn(), + } + ]; + + const res = await nextStep(memory, tools); + expect(res.step.type).toBe("tool_calls"); + if (res.step.type === "tool_calls") { + expect(res.step.tool_calls).toHaveLength(2); + expect(res.step.tool_calls[0]?.tool_name).toBe("test_tool"); + } + }); + + it("should execute parallel tool calls successfully when allowed", async () => { + let callCount = 0; + vi.spyOn(llmClient, "callModel").mockImplementation(async () => { + callCount++; + if (callCount === 1) { + return JSON.stringify({ + type: "tool_calls", + tool_calls: [ + { tool_name: "test_tool", arguments: { arg1: "val1" } }, + { tool_name: "test_tool", arguments: { arg1: "val2" } } + ] + }); + } + return JSON.stringify({ + type: "final_answer", + answer: "Finished parallel work.", + }); + }); + + vi.mocked(decide).mockResolvedValue({ + decision: "ALLOW", + }); + + vi.mocked(mcpExecutor.execute).mockResolvedValue("mockResult"); + + const result = await runAgent("Do parallel tasks", "conv-parallel-1"); + expect(result.status).toBe("SUCCESS"); + expect(result.answer).toBe("Finished parallel work."); + expect(mcpExecutor.execute).toHaveBeenCalledTimes(2); + expect(result.memory.toolResults).toContain("mockResult"); + }); + + it("should request approval for parallel tool calls when pending", async () => { + vi.spyOn(llmClient, "callModel").mockResolvedValue( + JSON.stringify({ + type: "tool_calls", + tool_calls: [ + { tool_name: "test_tool", arguments: { arg1: "val1" } } + ] + }) + ); + + vi.mocked(decide).mockResolvedValue({ + decision: "PENDING", + reason: "approval-parallel-123", + }); + + const result = await runAgent("Do parallel task requiring approval", "conv-parallel-2"); + expect(result.status).toBe("PENDING"); + expect(result.approvalId).toBe("approval-parallel-123"); + expect(result.memory.approvalId).toBe("approval-parallel-123"); + }); + }); + describe("Gemini API Client Timeout", () => { afterEach(() => { vi.unstubAllGlobals(); diff --git a/apps/api/src/agent/llm.ts b/apps/api/src/agent/llm.ts index db4bd7f..3c63c71 100644 --- a/apps/api/src/agent/llm.ts +++ b/apps/api/src/agent/llm.ts @@ -3,11 +3,74 @@ import { Memory, Tool, ToolCall, FinalAnswer, AgentStep } from "../../types.js"; export const llmClient = { async callModel(prompt: string): Promise { - const apiKey = process.env.GEMINI_API_KEY; - if (!apiKey) { - throw new Error("GEMINI_API_KEY environment variable is not defined"); + if (process.env.MOCK_LLM === "true") { + const lines = prompt.split("\n"); + let lastUserIndex = -1; + let hasToolAfterUser = false; + let lastUserLine = ""; + + for (let i = lines.length - 1; i >= 0; i--) { + const line = lines[i]?.trim() || ""; + if (line.startsWith("USER:")) { + lastUserIndex = i; + lastUserLine = line; + break; + } + } + + if (lastUserIndex !== -1) { + for (let i = lastUserIndex + 1; i < lines.length; i++) { + const line = lines[i]?.trim() || ""; + if (line.startsWith("TOOL:") || line.startsWith("tool:")) { + hasToolAfterUser = true; + break; + } + } + } + + if (lastUserLine.includes("sandbox/test.txt")) { + if (hasToolAfterUser) { + return JSON.stringify({ + type: "final_answer", + answer: "Successfully wrote sandbox/test.txt" + }); + } + return JSON.stringify({ + type: "tool_call", + tool_name: "write_file", + arguments: { + path: "sandbox/test.txt", + content: "Hello GateKeeper" + } + }); + } + + if (lastUserLine.includes("sandbox/allowed.txt")) { + if (hasToolAfterUser) { + return JSON.stringify({ + type: "final_answer", + answer: "Successfully wrote sandbox/allowed.txt" + }); + } + return JSON.stringify({ + type: "tool_call", + tool_name: "write_file", + arguments: { + path: "sandbox/allowed.txt", + content: "Auto approved content" + } + }); + } + + return JSON.stringify({ + type: "final_answer", + answer: "Mock response." + }); } + const geminiKey = process.env.GEMINI_API_KEY; + const grokKey = process.env.GROK_API_KEY ; + let timeoutMs = 30000; if (process.env.GEMINI_TIMEOUT_MS) { const parsed = parseInt(process.env.GEMINI_TIMEOUT_MS, 10); @@ -16,38 +79,120 @@ export const llmClient = { } } - const response = await fetch( - "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent", - { - method: "POST", - headers: { - "Content-Type": "application/json", - "x-goog-api-key": apiKey, - }, - body: JSON.stringify({ - contents: [{ - parts: [{ text: prompt }] - }], - generationConfig: { - responseMimeType: "application/json" + let geminiError: Error = new Error("Unknown error"); + + // 1. Try Gemini first (as first preference) + if (geminiKey && geminiKey.trim() !== "") { + try { + const response = await fetch( + "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent", + { + method: "POST", + headers: { + "Content-Type": "application/json", + "x-goog-api-key": geminiKey, + }, + body: JSON.stringify({ + contents: [{ + parts: [{ text: prompt }] + }], + generationConfig: { + responseMimeType: "application/json" + } + }), + signal: AbortSignal.timeout(timeoutMs) } - }), - signal: AbortSignal.timeout(timeoutMs) - } - ); + ); - if (!response.ok) { - const errorText = await response.text(); - throw new Error(`Gemini API request failed with status ${response.status}: ${errorText}`); + if (!response.ok) { + const errorText = await response.text(); + throw new Error(`Gemini API returned status ${response.status}: ${errorText}`); + } + + const json: any = await response.json(); + const text = json.candidates?.[0]?.content?.parts?.[0]?.text; + if (!text) { + throw new Error("Invalid response received from Gemini API"); + } + return text; + } catch (err: any) { + // If it is a client timeout/abort error, propagate it directly without fallback + if (err.name === "AbortError" || err.message.includes("aborted")) { + throw err; + } + geminiError = err; + console.warn("Gemini API call failed, attempting fallback to Grok:", err.message); + } + } else { + geminiError = new Error("GEMINI_API_KEY environment variable is not defined"); } - const json: any = await response.json(); - const text = json.candidates?.[0]?.content?.parts?.[0]?.text; - if (!text) { - throw new Error("Invalid response received from Gemini API"); + // 2. Fallback to Grok (xAI API) or Groq if Gemini failed + if (grokKey && grokKey.trim() !== "") { + const isGroq = grokKey.trim().startsWith("gsk_"); + const endpoint = isGroq + ? "https://api.groq.com/openai/v1/chat/completions" + : "https://api.x.ai/v1/chat/completions"; + + const defaultModel = isGroq ? "llama-3.3-70b-versatile" : "grok-2"; + const model = process.env.GROK_MODEL || process.env.XAI_MODEL || defaultModel; + const providerName = isGroq ? "Groq" : "Grok"; + + try { + const response = await fetch( + endpoint, + { + method: "POST", + headers: { + "Content-Type": "application/json", + "Authorization": `Bearer ${grokKey.trim()}`, + }, + body: JSON.stringify({ + model: model, + messages: [{ role: "user", content: prompt }], + response_format: { type: "json_object" } + }), + signal: AbortSignal.timeout(timeoutMs) + } + ); + + if (!response.ok) { + const errorText = await response.text(); + throw new Error(`${providerName} API returned status ${response.status}: ${errorText}`); + } + + const json: any = await response.json(); + // Support choices (Grok/Groq) and candidates (for stub testing) + if (json.choices) { + const text = json.choices?.[0]?.message?.content; + if (!text) { + throw new Error(`Invalid response received from ${providerName} API`); + } + return text; + } else if (json.candidates) { + const text = json.candidates?.[0]?.content?.parts?.[0]?.text; + if (!text) { + throw new Error("Invalid response received from fallback model"); + } + return text; + } else { + throw new Error(`Unknown response format received from ${providerName} API`); + } + } catch (err: any) { + console.error(`${providerName} fallback API call failed:`, err.message); + throw new Error( + `Security Agent Service Error: Both primary (Gemini) and fallback (${providerName}) models failed to respond.\n` + + `• Gemini Error: ${geminiError.message}\n` + + `• ${providerName} Error: ${err.message}` + ); + } } - return text; + // 3. Display user-friendly message if both failed or fallback API key is missing + throw new Error( + `Security Agent Service Error: Primary model (Gemini) failed to respond, and no fallback model is configured.\n` + + `• Gemini Error: ${geminiError.message}` + ); } }; @@ -126,13 +271,22 @@ Conversation history: ${messagesContext} Output your next step as a single JSON object. Do not include any other text, markdown formatting, or code blocks. -If you need to call a tool, output: +If you need to call a single tool, output: { "type": "tool_call", "tool_name": "name_of_tool", "arguments": { ... } } +If you need to call multiple independent tools in parallel, output: +{ + "type": "tool_calls", + "tool_calls": [ + { "tool_name": "name_of_tool_1", "arguments": { ... } }, + { "tool_name": "name_of_tool_2", "arguments": { ... } } + ] +} + If you are done and have a final answer, output: { "type": "final_answer", @@ -179,6 +333,40 @@ If you are done and have a final answer, output: }, tokens }; + } else if (parsed.type === "tool_calls") { + const { tool_calls } = parsed; + if (!Array.isArray(tool_calls)) { + throw new Error("Invalid LLM output structure for parallel tool calls"); + } + if (tool_calls.length === 0) { + throw new Error("LLM returned an empty tool_calls array; at least one tool is required"); + } + + for (const tc of tool_calls) { + if (!tc || typeof tc !== "object" || typeof tc.tool_name !== "string" || !tc.arguments || typeof tc.arguments !== "object" || Array.isArray(tc.arguments)) { + throw new Error("Invalid tool call in parallel list"); + } + + const tool = tools.find(t => t.name === tc.tool_name); + if (!tool) { + throw new Error(`Unknown tool: ${tc.tool_name}`); + } + + if (!validateSchema(tool.inputSchema, tc.arguments)) { + throw new Error(`Invalid arguments for tool ${tc.tool_name}`); + } + } + + return { + step: { + type: "tool_calls", + tool_calls: tool_calls.map(tc => ({ + tool_name: tc.tool_name, + arguments: tc.arguments + })) + }, + tokens + }; } else if (parsed.type === "final_answer") { const { answer } = parsed; if (typeof answer !== "string") { diff --git a/apps/api/src/agent/loop.ts b/apps/api/src/agent/loop.ts index 5c4ed9d..5f687d0 100644 --- a/apps/api/src/agent/loop.ts +++ b/apps/api/src/agent/loop.ts @@ -85,6 +85,7 @@ export async function runAgent( if (!approval) { logger.warn("Resumed approval record not found", { conversation_id: conversationId, approval_id: activeApprovalId }); await updateTokens(); + memory.addMessage("assistant", "Execution Denied: Approval not found"); return { status: "DENY", reason: "Approval not found", @@ -105,6 +106,8 @@ export async function runAgent( if (approval.status !== ApprovalStatus.APPROVED) { logger.warn("Resumed approval record is not approved", { conversation_id: conversationId, approval_id: activeApprovalId, status: approval.status }); await updateTokens(); + const reasonMsg = approval.status === ApprovalStatus.REJECTED ? "Approval rejected" : "Approval not approved"; + memory.addMessage("assistant", `Execution Denied: ${reasonMsg}`); return { status: "DENY", reason: "Approval not approved", @@ -112,11 +115,18 @@ export async function runAgent( }; } - step = { - type: "tool_call", - tool_name: approval.tool_name, - arguments: approval.arguments as Record - }; + if (approval.tool_name === "multiple_tool_calls") { + step = { + type: "tool_calls", + tool_calls: (approval.arguments as any).tool_calls + }; + } else { + step = { + type: "tool_call", + tool_name: approval.tool_name, + arguments: approval.arguments as Record + }; + } } else { // Consult the LLM to get the next step const nextResult = await nextStep(memory, tools); @@ -138,21 +148,32 @@ export async function runAgent( } // Record tool call to assistant messages - memory.addMessage("assistant", `Call tool ${step.tool_name} with arguments: ${JSON.stringify(step.arguments)}`); + if (step.type === "tool_call") { + memory.addMessage("assistant", `Call tool ${step.tool_name} with arguments: ${JSON.stringify(step.arguments)}`); + } else { + memory.addMessage("assistant", `Call parallel tools: ${JSON.stringify(step.tool_calls)}`); + } // Evaluate the tool execution policy using decide() - const decisionContext = { - tool_name: step.tool_name, - arguments: step.arguments, - approvalId: activeApprovalId - }; + const decisionContext = step.type === "tool_call" + ? { + tool_name: step.tool_name, + arguments: step.arguments, + approvalId: activeApprovalId + } + : { + tool_name: "multiple_tool_calls", + arguments: { tool_calls: step.tool_calls }, + approvalId: activeApprovalId + }; - logger.info("Evaluating tool execution policy", { conversation_id: conversationId, tool_name: step.tool_name }); + logger.info("Evaluating tool execution policy", { conversation_id: conversationId, tool_name: decisionContext.tool_name }); const decisionResult = await decide(decisionContext, { conversationId, token: accumulatedTokens }); - logger.info("Policy decision evaluated", { conversation_id: conversationId, tool_name: step.tool_name, decision: decisionResult.decision }); + logger.info("Policy decision evaluated", { conversation_id: conversationId, tool_name: decisionContext.tool_name, decision: decisionResult.decision }); if (decisionResult.decision === "DENY") { await updateTokens(); + memory.addMessage("assistant", `Execution Denied: ${decisionResult.reason || "Tool execution denied"}`); return { status: "DENY", reason: decisionResult.reason || "Tool execution denied", @@ -174,23 +195,85 @@ export async function runAgent( } if (decisionResult.decision === "ALLOW") { + const failures: { tool_name: string; error: string }[] = []; try { - logger.info("Executing approved tool call", { conversation_id: conversationId, tool_name: step.tool_name }); - const result = await mcpExecutor.execute(step.tool_name, step.arguments, { - conversationId, - decision: "ALLOW" - }); - + const executions = step.type === "tool_call" + ? [{ tool_name: step.tool_name, arguments: step.arguments }] + : step.tool_calls; + + logger.info("Executing approved tool call(s)", { conversation_id: conversationId, count: executions.length }); + + const results = await Promise.allSettled( + executions.map(async (exec) => { + const res = await mcpExecutor.execute(exec.tool_name, exec.arguments, { + conversationId, + decision: "ALLOW" + }); + return { tool_name: exec.tool_name, result: res }; + }) + ); + // Reset approval ID once execution has completed activeApprovalId = undefined; memory.clearApproval(); - - // Store results in memory history - memory.addToolResult(result); - memory.addMessage("tool", JSON.stringify(result)); + + const successResults: any[] = []; + + for (let i = 0; i < results.length; i++) { + const outcome = results[i]; + const exec = executions[i]; + if (!outcome || !exec) continue; + + if (outcome.status === "fulfilled") { + const val = (outcome as PromiseFulfilledResult<{ tool_name: string; result: unknown; }>).value; + successResults.push(val); + memory.addToolResult(val.result); + } else { + const reason = (outcome as PromiseRejectedResult).reason; + const errMsg = reason?.message || String(reason); + failures.push({ tool_name: exec.tool_name, error: errMsg }); + memory.addToolResult({ + tool_name: exec.tool_name, + isError: true, + error: errMsg + }); + } + } + + // Format results for the agent message history + if (step.type === "tool_call") { + const outcome = results[0]; + if (outcome && outcome.status === "fulfilled") { + const val = (outcome as PromiseFulfilledResult<{ tool_name: string; result: unknown; }>).value; + memory.addMessage("tool", JSON.stringify(val.result)); + } else if (outcome) { + const reason = (outcome as PromiseRejectedResult).reason; + memory.addMessage("tool", JSON.stringify({ error: reason?.message || String(reason) })); + } + } else { + // For parallel calls, return an array of results/errors in the same order + const formattedList = results.map((outcome) => { + if (outcome.status === "fulfilled") { + return (outcome as PromiseFulfilledResult<{ tool_name: string; result: unknown; }>).value.result; + } else { + const reason = (outcome as PromiseRejectedResult).reason; + return { error: reason?.message || String(reason) }; + } + }); + memory.addMessage("tool", JSON.stringify(formattedList)); + } + } catch (execError: any) { throw new Error(`Tool execution failed: ${execError.message || execError}`); } + + if (failures.length > 0) { + const firstFail = failures[0]; + if (failures.length === 1 && step.type === "tool_call" && firstFail) { + throw new Error(`Tool execution failed: ${firstFail.error}`); + } + throw new Error(`Tool execution failed for: ${failures.map(f => `${f.tool_name} (${f.error})`).join(", ")}`); + } } } } catch (error: any) { diff --git a/apps/api/src/index.ts b/apps/api/src/index.ts index 4158321..bdf3d4e 100644 --- a/apps/api/src/index.ts +++ b/apps/api/src/index.ts @@ -135,6 +135,7 @@ app.post("/agent/run", async (req, res) => { }); } catch (error: any) { console.error("Agent execution failed:", error); + // Do not expose internal error details to clients res.status(500).json({ error: "Internal server error" }); } }); diff --git a/apps/api/src/policy/decision.ts b/apps/api/src/policy/decision.ts index b22b282..a893226 100644 --- a/apps/api/src/policy/decision.ts +++ b/apps/api/src/policy/decision.ts @@ -14,11 +14,210 @@ export async function decide( conversation: ConversationRequest, ): Promise { try { + // Intercept multiple parallel tool calls + if (context.tool_name === "multiple_tool_calls") { + const toolCalls = (context.arguments as any)?.tool_calls; + if (!Array.isArray(toolCalls)) { + // Audit the denial so this rejection path is never invisible + try { + await db.log.create({ + data: { + tool_name: "multiple_tool_calls", + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Invalid parallel tool_calls argument structure`, + }, + }); + } catch (logErr) { + console.error("Failed to write denial log for invalid parallel args:", logErr); + } + return { + decision: "DENY", + reason: "Invalid parallel tool calls arguments structure", + }; + } + + // If activeApprovalId is provided, we check the status of the approval record + if (context.approvalId) { + const approval = await db.approval.findUnique({ + where: { id: context.approvalId }, + }); + + if (!approval) { + await db.log.create({ + data: { + tool_name: "multiple_tool_calls", + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Approval not found (ID: ${context.approvalId})`, + }, + }); + return { + decision: "DENY", + reason: "Approval not found", + }; + } + + if (approval.tool_name !== "multiple_tool_calls") { + await db.log.create({ + data: { + tool_name: "multiple_tool_calls", + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Approval tool name mismatch (ID: ${context.approvalId})`, + }, + }); + return { + decision: "DENY", + reason: "Approval tool name mismatch", + }; + } + + switch (approval.status) { + case ApprovalStatus.APPROVED: + // Check if any tool in the parallel batch is now blocked by a policy change + for (const tc of toolCalls) { + const policyResult = await PolicyEngine(tc, conversation); + if (!policyResult.allowed && !policyResult.requiresApproval) { + await db.log.create({ + data: { + tool_name: tc.tool_name, + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Blocked on parallel resume: ${policyResult.reason || "Denied by policy configuration"}`, + }, + }); + return { + decision: "DENY", + reason: `Tool execution blocked on resume: ${tc.tool_name} - ${policyResult.reason || "Denied by policy configuration"}`, + }; + } + } + + try { + await db.approval.delete({ + where: { id: approval.id }, + }); + } catch (err: any) { + await db.log.create({ + data: { + tool_name: "multiple_tool_calls", + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Could not delete approval record, aborting to prevent replay (ID: ${approval.id})`, + }, + }); + return { decision: "DENY", reason: "Approval record deletion failed" }; + } + // Log ALLOW for each individual tool call only after confirmed deletion + for (const tc of toolCalls) { + await db.log.create({ + data: { + tool_name: tc.tool_name, + decision: "ALLOW", + reason: `Conversation: ${conversation.conversationId} | Approved by user in parallel step (ID: ${approval.id})`, + }, + }); + } + return { + decision: "ALLOW", + }; + case ApprovalStatus.PENDING: + return { + decision: "PENDING", + }; + case ApprovalStatus.REJECTED: + for (const tc of toolCalls) { + await db.log.create({ + data: { + tool_name: tc.tool_name, + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Rejected by user in parallel step (ID: ${approval.id})`, + }, + }); + } + return { + decision: "DENY", + reason: "Approval rejected", + }; + default: + return { + decision: "DENY", + reason: "Unrecognized approval status", + }; + } + } + + // No approvalId provided, evaluate each tool call against PolicyEngine + const pendingTools: typeof toolCalls = []; + for (const tc of toolCalls) { + const policyResult = await PolicyEngine(tc, conversation); + if (!policyResult.allowed && !policyResult.requiresApproval) { + // One of the tools is explicitly denied + await db.log.create({ + data: { + tool_name: tc.tool_name, + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Blocked in parallel step: ${policyResult.reason || "Denied by policy configuration"}`, + }, + }); + return { + decision: "DENY", + reason: `Tool execution blocked: ${tc.tool_name} - ${policyResult.reason || "Denied by policy configuration"}`, + }; + } + if (policyResult.requiresApproval) { + pendingTools.push(tc); + } + } + + // If any tool requires approval, create a single approval record for multiple_tool_calls + if (pendingTools.length > 0) { + const created = await db.approval.create({ + data: { + tool_name: "multiple_tool_calls", + arguments: { tool_calls: toolCalls } as any, + status: ApprovalStatus.PENDING, + }, + }); + + await db.log.create({ + data: { + tool_name: "multiple_tool_calls", + decision: "PENDING", + reason: `Conversation: ${conversation.conversationId} | Parallel execution requires manual approval (ID: ${created.id})`, + }, + }); + + return { + decision: "PENDING", + reason: created.id, + }; + } + + // All tool calls are allowed + for (const tc of toolCalls) { + await db.log.create({ + data: { + tool_name: tc.tool_name, + decision: "ALLOW", + reason: `Conversation: ${conversation.conversationId} | Allowed by policy in parallel step`, + }, + }); + } + + return { + decision: "ALLOW", + }; + } + // Step 1: Call PolicyEngine const policy = await PolicyEngine(context, conversation); // Step 2: Policy denied and does not require approval if (!policy.allowed && !policy.requiresApproval) { + await db.log.create({ + data: { + tool_name: context.tool_name, + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Blocked: ${policy.reason || "Denied by policy configuration"}`, + }, + }); return { decision: "DENY", reason: policy.reason, @@ -36,6 +235,14 @@ export async function decide( }, }); + await db.log.create({ + data: { + tool_name: context.tool_name, + decision: "PENDING", + reason: `Conversation: ${conversation.conversationId} | Requires manual approval (ID: ${created.id})`, + }, + }); + return { decision: "PENDING", reason: created.id, @@ -49,6 +256,13 @@ export async function decide( }); if (!approval) { + await db.log.create({ + data: { + tool_name: context.tool_name, + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Approval not found (ID: ${context.approvalId})`, + }, + }); return { decision: "DENY", reason: "Approval not found", @@ -56,6 +270,13 @@ export async function decide( } if (approval.tool_name !== context.tool_name) { + await db.log.create({ + data: { + tool_name: context.tool_name, + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Approval tool name mismatch (ID: ${context.approvalId})`, + }, + }); return { decision: "DENY", reason: "Approval tool name mismatch", @@ -64,22 +285,56 @@ export async function decide( switch (approval.status) { case ApprovalStatus.APPROVED: - await db.approval.delete({ - where: { id: approval.id }, + try { + await db.approval.delete({ + where: { id: approval.id }, + }); + } catch (err: any) { + await db.log.create({ + data: { + tool_name: context.tool_name, + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Could not delete approval record, aborting to prevent replay (ID: ${approval.id})`, + }, + }); + return { decision: "DENY", reason: "Approval record deletion failed" }; + } + // Write ALLOW log only after deletion is confirmed + await db.log.create({ + data: { + tool_name: context.tool_name, + decision: "ALLOW", + reason: `Conversation: ${conversation.conversationId} | Approved by user (ID: ${approval.id})`, + }, }); return { decision: "ALLOW", }; case ApprovalStatus.PENDING: + // We do not write duplicate pending logs on query iterations return { decision: "PENDING", }; case ApprovalStatus.REJECTED: + await db.log.create({ + data: { + tool_name: context.tool_name, + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Rejected by user (ID: ${approval.id})`, + }, + }); return { decision: "DENY", reason: "Approval rejected", }; default: + await db.log.create({ + data: { + tool_name: context.tool_name, + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Unrecognized approval status (ID: ${approval.id})`, + }, + }); return { decision: "DENY", reason: "Unrecognized approval status", @@ -89,17 +344,43 @@ export async function decide( // Step 4: Policy allowed if (policy.allowed) { + await db.log.create({ + data: { + tool_name: context.tool_name, + decision: "ALLOW", + reason: `Conversation: ${conversation.conversationId} | Allowed by policy`, + }, + }); return { decision: "ALLOW", }; } // Fallback/Safety return + await db.log.create({ + data: { + tool_name: context.tool_name, + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Unrecognized policy state`, + }, + }); return { decision: "DENY", reason: "Unrecognized policy state", }; } catch (error) { + const errMsg = error instanceof Error ? error.message : String(error); + try { + await db.log.create({ + data: { + tool_name: context.tool_name, + decision: "DENY", + reason: `Conversation: ${conversation.conversationId} | Decision engine failure: ${errMsg}`, + }, + }); + } catch (logErr) { + console.error("Failed to write failure log:", logErr); + } return { decision: "DENY", reason: "Decision engine failure", diff --git a/apps/api/src/policy/engine.ts b/apps/api/src/policy/engine.ts index 4e5f712..c7669ea 100644 --- a/apps/api/src/policy/engine.ts +++ b/apps/api/src/policy/engine.ts @@ -1,6 +1,7 @@ import needsApproval from "./rules/approval.js"; import isblocked from "./rules/block.js"; import budgetExceeded from "./rules/budget.js"; +import withinSandboxPath from "./rules/pathRule.js"; import type { ApprovalRequest, ConversationRequest } from "../../types.js"; import { db } from "@repo/db"; import { logger } from "../../mcp/logger.js"; @@ -55,7 +56,24 @@ export default async function PolicyEngine( }; } - // 2. Budget Check + // 2. Path Sandbox Check + const pathResult = await withinSandboxPath(tool_name, context.arguments, policy); + if (!pathResult.success) { + return { + allowed: false, + requiresApproval: false, + reason: pathResult.reason, + }; + } + if (pathResult.result) { + return { + allowed: false, + requiresApproval: false, + reason: pathResult.reason, + }; + } + + // 3. Budget Check const budgetResult = await budgetExceeded( conversation.conversationId, conversation.token, @@ -75,7 +93,7 @@ export default async function PolicyEngine( }; } - // 3. Approval Check + // 4. Approval Check const approvalResult = await needsApproval(tool_name, policy); if (!approvalResult.success) { return { diff --git a/apps/api/src/policy/policy.test.ts b/apps/api/src/policy/policy.test.ts index 6548bbf..6de87fb 100644 --- a/apps/api/src/policy/policy.test.ts +++ b/apps/api/src/policy/policy.test.ts @@ -33,6 +33,12 @@ vi.mock("@repo/db", () => { create: vi.fn(), delete: vi.fn(), updateMany: vi.fn(), + findMany: vi.fn(), + }, + log: { + create: vi.fn(), + findMany: vi.fn(), + deleteMany: vi.fn(), }, }, }; @@ -45,6 +51,7 @@ import { db, PolicyAction, ApprovalStatus } from "@repo/db"; import isblocked from "./rules/block.js"; import budgetExceeded from "./rules/budget.js"; import needsApproval from "./rules/approval.js"; +import withinSandboxPath from "./rules/pathRule.js"; import PolicyEngine from "./engine.js"; import { decide } from "./decision.js"; import policiesRouter from "./router.js"; @@ -230,6 +237,221 @@ describe("Policy Engine Rules & Orchestrator", () => { expect(res.requiresApproval).toBe(false); expect(res.reason).toBe("Failed to query policy table"); }); + + it("should deny when path argument escapes the configured sandbox_path", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "write_file", + action: PolicyAction.ALLOW, + sandbox_path: "/tmp/sandbox", + createdAt: new Date(), + updatedAt: new Date(), + }); + vi.mocked(db.conversation.findUnique).mockResolvedValue({ + id: "conv-1", + tokens_used: 10, + budget_limit: 100, + budget_reset_at: new Date(), + createdAt: new Date(), + }); + + const res = await PolicyEngine( + { tool_name: "write_file", arguments: { path: "../../etc/passwd" } }, + { conversationId: "conv-1", token: 10 }, + ); + + expect(res.allowed).toBe(false); + expect(res.requiresApproval).toBe(false); + expect(res.reason).toMatch(/escapes the configured sandbox/); + }); + + it("should allow when path argument is within the configured sandbox_path", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "write_file", + action: PolicyAction.ALLOW, + sandbox_path: "/tmp/sandbox", + createdAt: new Date(), + updatedAt: new Date(), + }); + vi.mocked(db.conversation.findUnique).mockResolvedValue({ + id: "conv-1", + tokens_used: 10, + budget_limit: 100, + budget_reset_at: new Date(), + createdAt: new Date(), + }); + + const res = await PolicyEngine( + { tool_name: "write_file", arguments: { path: "notes/hello.txt" } }, + { conversationId: "conv-1", token: 10 }, + ); + + // Not blocked, not escaped — ends up at the approval check + // (no sandbox_path escape, budget OK → allowed by ALLOW policy) + expect(res.allowed).toBe(true); + expect(res.requiresApproval).toBe(false); + }); + }); +}); + +describe("Rule: withinSandboxPath", () => { + beforeEach(() => { + vi.clearAllMocks(); + }); + + it("should skip the check and return result:false when no sandbox_path is configured", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "write_file", + action: PolicyAction.ALLOW, + sandbox_path: null, + createdAt: new Date(), + updatedAt: new Date(), + }); + + const res = await withinSandboxPath("write_file", { path: "../../etc/passwd" }); + expect(res.success).toBe(true); + expect(res.result).toBe(false); // rule skipped — not a violation + }); + + it("should skip the check when the tool has no string arguments", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "list_files", + action: PolicyAction.ALLOW, + sandbox_path: "/tmp/sandbox", + createdAt: new Date(), + updatedAt: new Date(), + }); + + const res = await withinSandboxPath("list_files", {}); + expect(res.success).toBe(true); + expect(res.result).toBe(false); + }); + + it("should return result:false (allowed) for a valid path inside the sandbox", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "write_file", + action: PolicyAction.ALLOW, + sandbox_path: "/tmp/sandbox", + createdAt: new Date(), + updatedAt: new Date(), + }); + + const res = await withinSandboxPath("write_file", { path: "notes/hello.txt" }); + expect(res.success).toBe(true); + expect(res.result).toBe(false); + }); + + it("should return result:true (violation) for a relative traversal path", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "write_file", + action: PolicyAction.ALLOW, + sandbox_path: "/tmp/sandbox", + createdAt: new Date(), + updatedAt: new Date(), + }); + + const res = await withinSandboxPath("write_file", { path: "../../etc/passwd" }); + expect(res.success).toBe(true); + expect(res.result).toBe(true); + expect(res.reason).toMatch(/escapes the configured sandbox/); + }); + + it("should return result:true (violation) for an absolute path that escapes the sandbox", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "read_file", + action: PolicyAction.ALLOW, + sandbox_path: "/tmp/sandbox", + createdAt: new Date(), + updatedAt: new Date(), + }); + + const res = await withinSandboxPath("read_file", { path: "/etc/passwd" }); + expect(res.success).toBe(true); + expect(res.result).toBe(true); + expect(res.reason).toMatch(/escapes the configured sandbox/); + }); + + it("should return result:false for a path prefixed with the sandbox basename", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "write_file", + action: PolicyAction.ALLOW, + sandbox_path: "/tmp/sandbox", + createdAt: new Date(), + updatedAt: new Date(), + }); + + // Agent passes "sandbox/notes/file.txt" — the prefix should be stripped + const res = await withinSandboxPath("write_file", { path: "sandbox/notes/file.txt" }); + expect(res.success).toBe(true); + expect(res.result).toBe(false); + }); + + it("should return result:true (violation) for a move_file where destination escapes", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "move_file", + action: PolicyAction.ALLOW, + sandbox_path: "/tmp/sandbox", + createdAt: new Date(), + updatedAt: new Date(), + }); + + // source is fine, but destination escapes the sandbox + const res = await withinSandboxPath("move_file", { + source: "notes/file.txt", + destination: "../../outside.txt", + }); + expect(res.success).toBe(true); + expect(res.result).toBe(true); + expect(res.reason).toMatch(/escapes the configured sandbox/); + }); + + it("should return result:true (violation) for an empty path argument", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "write_file", + action: PolicyAction.ALLOW, + sandbox_path: "/tmp/sandbox", + createdAt: new Date(), + updatedAt: new Date(), + }); + + const res = await withinSandboxPath("write_file", { path: "" }); + expect(res.success).toBe(true); + expect(res.result).toBe(true); + expect(res.reason).toMatch(/must not be empty/); + }); + + it("should fail closed (success:false) on a database error", async () => { + vi.mocked(db.policy.findUnique).mockRejectedValue(new Error("DB failure")); + + const res = await withinSandboxPath("write_file", { path: "file.txt" }); + expect(res.success).toBe(false); + expect(res.result).toBe(false); + expect(res.reason).toBe("Failed to evaluate path sandbox rule"); + }); + + it("uses pre-fetched policy and does not call db.policy.findUnique", async () => { + const preFetched = { + id: "1", + tool_name: "write_file", + action: PolicyAction.ALLOW, + sandbox_path: "/tmp/sandbox", + createdAt: new Date(), + updatedAt: new Date(), + }; + + const res = await withinSandboxPath("write_file", { path: "ok.txt" }, preFetched); + expect(db.policy.findUnique).not.toHaveBeenCalled(); + expect(res.success).toBe(true); + expect(res.result).toBe(false); }); }); @@ -373,6 +595,79 @@ describe("Decision Orchestration (decide)", () => { expect(res.decision).toBe("DENY"); expect(res.reason).toBe("Approval tool name mismatch"); }); + + it("should return ALLOW on parallel approved resume if all tools are allowed by current policy", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "test_tool", + action: PolicyAction.ALLOW, + sandbox_path: null, + createdAt: new Date(), + updatedAt: new Date(), + }); + vi.mocked(db.conversation.findUnique).mockResolvedValue({ + id: "conv-1", + tokens_used: 10, + budget_limit: 100, + budget_reset_at: new Date(), + createdAt: new Date(), + }); + vi.mocked(db.approval.findUnique).mockResolvedValue({ + id: "app-id-999", + tool_name: "multiple_tool_calls", + arguments: { tool_calls: [{ tool_name: "test_tool", arguments: {} }] }, + status: ApprovalStatus.APPROVED, + createdAt: new Date(), + updatedAt: new Date(), + }); + + const res = await decide( + { tool_name: "multiple_tool_calls", arguments: { tool_calls: [{ tool_name: "test_tool", arguments: {} }] }, approvalId: "app-id-999" }, + { conversationId: "conv-1", token: 5 }, + ); + + expect(res.decision).toBe("ALLOW"); + expect(db.approval.delete).toHaveBeenCalledWith({ + where: { id: "app-id-999" }, + }); + }); + + it("should return DENY on parallel approved resume if any tool in batch is explicitly denied", async () => { + vi.mocked(db.policy.findUnique).mockResolvedValue({ + id: "1", + tool_name: "denied_tool", + action: PolicyAction.DENY, + sandbox_path: null, + createdAt: new Date(), + updatedAt: new Date(), + }); + vi.mocked(db.conversation.findUnique).mockResolvedValue({ + id: "conv-1", + tokens_used: 10, + budget_limit: 100, + budget_reset_at: new Date(), + createdAt: new Date(), + }); + vi.mocked(db.approval.findUnique).mockResolvedValue({ + id: "app-id-999", + tool_name: "multiple_tool_calls", + arguments: { tool_calls: [{ tool_name: "denied_tool", arguments: {} }] }, + status: ApprovalStatus.APPROVED, + createdAt: new Date(), + updatedAt: new Date(), + }); + + const res = await decide( + { tool_name: "multiple_tool_calls", arguments: { tool_calls: [{ tool_name: "denied_tool", arguments: {} }] }, approvalId: "app-id-999" }, + { conversationId: "conv-1", token: 5 }, + ); + + expect(res.decision).toBe("DENY"); + expect(res.reason).toContain("Tool execution blocked on resume"); + expect(db.approval.delete).not.toHaveBeenCalledWith({ + where: { id: "app-id-999" }, + }); + }); }); describe("Policy Engine REST Endpoints", () => { @@ -434,6 +729,7 @@ describe("Policy Engine REST Endpoints", () => { tool_name: "unknown_tool", action: "APPROVAL", implicit: true, + sandbox_path: null, }); }); }); @@ -456,6 +752,42 @@ describe("Policy Engine REST Endpoints", () => { expect(res.status).toHaveBeenCalledWith(409); expect(res.json).toHaveBeenCalledWith({ error: "Policy already exists" }); }); + + it("should preserve empty string sandbox_path on creation", async () => { + const postPolicy = getHandler("/policies", "POST"); + vi.mocked(db.policy.findUnique).mockResolvedValue(null); + vi.mocked(db.policy.create).mockResolvedValue({ + tool_name: "new_tool", + action: PolicyAction.ALLOW, + sandbox_path: "", + } as any); + + const req = { + body: { tool_name: "new_tool", action: "ALLOW", sandbox_path: "" }, + } as any as Request; + const res = mockResponse(); + + await postPolicy(req, res, () => {}); + + expect(db.policy.create).toHaveBeenCalledWith({ + data: { + tool_name: "new_tool", + action: PolicyAction.ALLOW, + sandbox_path: "", + }, + select: { + tool_name: true, + action: true, + sandbox_path: true, + }, + }); + expect(res.status).toHaveBeenCalledWith(201); + expect(res.json).toHaveBeenCalledWith({ + tool_name: "new_tool", + action: PolicyAction.ALLOW, + sandbox_path: "", + }); + }); }); describe("POST /policies/approvals/:id/approve", () => { @@ -494,7 +826,7 @@ describe("Policy Engine REST Endpoints", () => { expect(res.json).toHaveBeenCalledWith({ error: "Approval not found" }); }); - it("should return 400 if approval status is not PENDING", async () => { + it("should return 200 (idempotent) if approval status is already APPROVED", async () => { const approveHandler = getHandler("/policies/approvals/:id/approve", "POST"); vi.mocked(db.approval.updateMany).mockResolvedValue({ count: 0 }); vi.mocked(db.approval.findUnique).mockResolvedValue({ @@ -507,6 +839,22 @@ describe("Policy Engine REST Endpoints", () => { await approveHandler(req, res, () => {}); + expect(res.json).toHaveBeenCalledWith({ id: "app-123", status: ApprovalStatus.APPROVED }); + }); + + it("should return 400 if approval status is REJECTED", async () => { + const approveHandler = getHandler("/policies/approvals/:id/approve", "POST"); + vi.mocked(db.approval.updateMany).mockResolvedValue({ count: 0 }); + vi.mocked(db.approval.findUnique).mockResolvedValue({ + id: "app-123", + status: ApprovalStatus.REJECTED, + } as any); + + const req = { params: { id: "app-123" } } as any as Request; + const res = mockResponse(); + + await approveHandler(req, res, () => {}); + expect(res.status).toHaveBeenCalledWith(400); expect(res.json).toHaveBeenCalledWith({ error: "Approval status is not PENDING" }); }); @@ -548,7 +896,7 @@ describe("Policy Engine REST Endpoints", () => { expect(res.json).toHaveBeenCalledWith({ error: "Approval not found" }); }); - it("should return 400 if approval status is not PENDING on rejection", async () => { + it("should return 200 (idempotent) if approval status is already REJECTED on rejection", async () => { const rejectHandler = getHandler("/policies/approvals/:id/reject", "POST"); vi.mocked(db.approval.updateMany).mockResolvedValue({ count: 0 }); vi.mocked(db.approval.findUnique).mockResolvedValue({ @@ -561,8 +909,177 @@ describe("Policy Engine REST Endpoints", () => { await rejectHandler(req, res, () => {}); + expect(res.json).toHaveBeenCalledWith({ id: "app-123", status: ApprovalStatus.REJECTED }); + }); + + it("should return 400 if approval status is APPROVED on rejection", async () => { + const rejectHandler = getHandler("/policies/approvals/:id/reject", "POST"); + vi.mocked(db.approval.updateMany).mockResolvedValue({ count: 0 }); + vi.mocked(db.approval.findUnique).mockResolvedValue({ + id: "app-123", + status: ApprovalStatus.APPROVED, + } as any); + + const req = { params: { id: "app-123" } } as any as Request; + const res = mockResponse(); + + await rejectHandler(req, res, () => {}); + expect(res.status).toHaveBeenCalledWith(400); expect(res.json).toHaveBeenCalledWith({ error: "Approval status is not PENDING" }); }); }); + + describe("GET /approvals", () => { + it("should return a list of approvals", async () => { + const getApprovals = getHandler("/approvals", "GET"); + expect(getApprovals).toBeDefined(); + + vi.mocked(db.approval.findMany).mockResolvedValue([ + { id: "app-123", tool_name: "test_tool", status: "PENDING" } + ] as any); + + const req = {} as Request; + const res = mockResponse(); + + await getApprovals(req, res, () => {}); + + expect(db.approval.findMany).toHaveBeenCalledWith({ + orderBy: { createdAt: "desc" }, + take: 100 + }); + expect(res.json).toHaveBeenCalledWith([ + { id: "app-123", tool_name: "test_tool", status: "PENDING" } + ]); + }); + + it("should return 400 for invalid page parameter", async () => { + const getApprovals = getHandler("/approvals", "GET"); + const req = { query: { page: "-1" } } as any as Request; + const res = mockResponse(); + + await getApprovals(req, res, () => {}); + + expect(res.status).toHaveBeenCalledWith(400); + expect(res.json).toHaveBeenCalledWith({ error: "page must be a positive integer greater than or equal to 1" }); + }); + + it("should return 400 for invalid limit parameter", async () => { + const getApprovals = getHandler("/approvals", "GET"); + const req = { query: { limit: "150" } } as any as Request; + const res = mockResponse(); + + await getApprovals(req, res, () => {}); + + expect(res.status).toHaveBeenCalledWith(400); + expect(res.json).toHaveBeenCalledWith({ error: "limit must be a positive integer between 1 and 100" }); + }); + + it("should return 400 for suffix-malformed page parameter", async () => { + const getApprovals = getHandler("/approvals", "GET"); + const req = { query: { page: "1abc" } } as any as Request; + const res = mockResponse(); + + await getApprovals(req, res, () => {}); + + expect(res.status).toHaveBeenCalledWith(400); + expect(res.json).toHaveBeenCalledWith({ error: "page must be a positive integer greater than or equal to 1" }); + }); + + it("should return 400 for suffix-malformed limit parameter", async () => { + const getApprovals = getHandler("/approvals", "GET"); + const req = { query: { limit: "10.5" } } as any as Request; + const res = mockResponse(); + + await getApprovals(req, res, () => {}); + + expect(res.status).toHaveBeenCalledWith(400); + expect(res.json).toHaveBeenCalledWith({ error: "limit must be a positive integer between 1 and 100" }); + }); + }); + + describe("GET /logs", () => { + it("should return a list of decision logs", async () => { + const getLogs = getHandler("/logs", "GET"); + expect(getLogs).toBeDefined(); + + vi.mocked(db.log.findMany).mockResolvedValue([ + { id: "log-123", tool_name: "test_tool", decision: "ALLOW" } + ] as any); + + const req = {} as Request; + const res = mockResponse(); + + await getLogs(req, res, () => {}); + + expect(db.log.findMany).toHaveBeenCalledWith({ + orderBy: { createdAt: "desc" }, + take: 100 + }); + expect(res.json).toHaveBeenCalledWith([ + { id: "log-123", tool_name: "test_tool", decision: "ALLOW" } + ]); + }); + + it("should return 400 for invalid page parameter", async () => { + const getLogs = getHandler("/logs", "GET"); + const req = { query: { page: "-1" } } as any as Request; + const res = mockResponse(); + + await getLogs(req, res, () => {}); + + expect(res.status).toHaveBeenCalledWith(400); + expect(res.json).toHaveBeenCalledWith({ error: "page must be a positive integer greater than or equal to 1" }); + }); + + it("should return 400 for invalid limit parameter", async () => { + const getLogs = getHandler("/logs", "GET"); + const req = { query: { limit: "150" } } as any as Request; + const res = mockResponse(); + + await getLogs(req, res, () => {}); + + expect(res.status).toHaveBeenCalledWith(400); + expect(res.json).toHaveBeenCalledWith({ error: "limit must be a positive integer between 1 and 100" }); + }); + + it("should return 400 for suffix-malformed page parameter", async () => { + const getLogs = getHandler("/logs", "GET"); + const req = { query: { page: "1abc" } } as any as Request; + const res = mockResponse(); + + await getLogs(req, res, () => {}); + + expect(res.status).toHaveBeenCalledWith(400); + expect(res.json).toHaveBeenCalledWith({ error: "page must be a positive integer greater than or equal to 1" }); + }); + + it("should return 400 for suffix-malformed limit parameter", async () => { + const getLogs = getHandler("/logs", "GET"); + const req = { query: { limit: "10.5" } } as any as Request; + const res = mockResponse(); + + await getLogs(req, res, () => {}); + + expect(res.status).toHaveBeenCalledWith(400); + expect(res.json).toHaveBeenCalledWith({ error: "limit must be a positive integer between 1 and 100" }); + }); + }); + + describe("DELETE /logs", () => { + it("should delete all logs and return 204", async () => { + const deleteLogs = getHandler("/logs", "DELETE"); + expect(deleteLogs).toBeDefined(); + + vi.mocked(db.log.deleteMany).mockResolvedValue({ count: 5 } as any); + + const req = {} as Request; + const res = mockResponse(); + + await deleteLogs(req, res, () => {}); + + expect(db.log.deleteMany).toHaveBeenCalled(); + expect(res.status).toHaveBeenCalledWith(204); + }); + }); }); diff --git a/apps/api/src/policy/router.ts b/apps/api/src/policy/router.ts index d2a038b..35a7f3d 100644 --- a/apps/api/src/policy/router.ts +++ b/apps/api/src/policy/router.ts @@ -10,6 +10,7 @@ router.get("/policies", async (req: Request, res: Response): Promise => { select: { tool_name: true, action: true, + sandbox_path: true, }, }); res.json(policies); @@ -34,6 +35,7 @@ router.get( select: { tool_name: true, action: true, + sandbox_path: true, }, }); @@ -42,6 +44,7 @@ router.get( tool_name: normalizedToolName, action: "APPROVAL", implicit: true, + sandbox_path: null, }); return; } @@ -55,7 +58,7 @@ router.get( // POST /policies router.post("/policies", async (req: Request, res: Response): Promise => { - const { tool_name, action } = req.body; + const { tool_name, action, sandbox_path } = req.body; if (!tool_name || typeof tool_name !== "string" || !tool_name.trim()) { res.status(400).json({ error: "Missing or invalid tool_name" }); @@ -74,6 +77,11 @@ router.post("/policies", async (req: Request, res: Response): Promise => { return; } + if (sandbox_path !== undefined && sandbox_path !== null && typeof sandbox_path !== "string") { + res.status(400).json({ error: "sandbox_path must be a string or null" }); + return; + } + try { const existing = await db.policy.findUnique({ where: { tool_name: normalizedToolName }, @@ -88,10 +96,12 @@ router.post("/policies", async (req: Request, res: Response): Promise => { data: { tool_name: normalizedToolName, action: action as PolicyAction, + sandbox_path: sandbox_path !== undefined ? sandbox_path : null, }, select: { tool_name: true, action: true, + sandbox_path: true, }, }); @@ -114,19 +124,26 @@ router.patch( res.status(400).json({ error: "Missing or invalid toolName parameter" }); return; } - const { action } = req.body; + const { action, sandbox_path } = req.body; const normalizedToolName = toolName.trim(); - if ( - !action || - !Object.values(PolicyAction).includes(action as PolicyAction) - ) { + if (action !== undefined && !Object.values(PolicyAction).includes(action as PolicyAction)) { res.status(400).json({ error: "Invalid action. Accepted values are ALLOW, APPROVAL, DENY", }); return; } + if (sandbox_path !== undefined && sandbox_path !== null && typeof sandbox_path !== "string") { + res.status(400).json({ error: "sandbox_path must be a string or null" }); + return; + } + + if (action === undefined && sandbox_path === undefined) { + res.status(400).json({ error: "Either action or sandbox_path must be provided to update" }); + return; + } + try { const existing = await db.policy.findUnique({ where: { tool_name: normalizedToolName }, @@ -137,14 +154,21 @@ router.patch( return; } + const updateData: any = {}; + if (action !== undefined) { + updateData.action = action as PolicyAction; + } + if (sandbox_path !== undefined) { + updateData.sandbox_path = sandbox_path; + } + const updated = await db.policy.update({ where: { tool_name: normalizedToolName }, - data: { - action: action as PolicyAction, - }, + data: updateData, select: { tool_name: true, action: true, + sandbox_path: true, }, }); @@ -208,6 +232,10 @@ async function handleApprovalStatusUpdate( res.status(404).json({ error: "Approval not found" }); return; } + if (exists.status === targetStatus) { + res.json({ id, status: targetStatus }); + return; + } res.status(400).json({ error: "Approval status is not PENDING" }); return; } @@ -244,4 +272,128 @@ router.post( } ); +function parsePaginationParams(req: Request): { page?: number; limit?: number; error?: string } { + const pageStr = req.query?.page; + const limitStr = req.query?.limit; + + let page: number | undefined; + let limit: number | undefined; + + if (pageStr !== undefined) { + if (typeof pageStr !== "string" || !/^\d+$/.test(pageStr)) { + return { error: "page must be a positive integer greater than or equal to 1" }; + } + const parsedPage = parseInt(pageStr, 10); + if (parsedPage < 1) { + return { error: "page must be a positive integer greater than or equal to 1" }; + } + page = parsedPage; + } + + if (limitStr !== undefined) { + if (typeof limitStr !== "string" || !/^\d+$/.test(limitStr)) { + return { error: "limit must be a positive integer between 1 and 100" }; + } + const parsedLimit = parseInt(limitStr, 10); + if (parsedLimit < 1 || parsedLimit > 100) { + return { error: "limit must be a positive integer between 1 and 100" }; + } + limit = parsedLimit; + } + + return { page, limit }; +} + +// GET /approvals +router.get("/approvals", async (req: Request, res: Response): Promise => { + try { + const { page, limit, error } = parsePaginationParams(req); + if (error) { + res.status(400).json({ error }); + return; + } + + if (page !== undefined || limit !== undefined) { + const p = page || 1; + const l = limit || 100; + const skip = (p - 1) * l; + const total = await db.approval.count(); + const approvals = await db.approval.findMany({ + orderBy: { createdAt: "desc" }, + skip, + take: l, + }); + res.json({ + data: approvals, + pagination: { + total, + page: p, + limit: l, + pages: Math.ceil(total / l), + } + }); + return; + } + + const approvals = await db.approval.findMany({ + orderBy: { createdAt: "desc" }, + take: 100, + }); + res.json(approvals); + } catch (error) { + res.status(500).json({ error: "Internal server error" }); + } +}); + +// GET /logs +router.get("/logs", async (req: Request, res: Response): Promise => { + try { + const { page, limit, error } = parsePaginationParams(req); + if (error) { + res.status(400).json({ error }); + return; + } + + if (page !== undefined || limit !== undefined) { + const p = page || 1; + const l = limit || 100; + const skip = (p - 1) * l; + const total = await db.log.count(); + const logs = await db.log.findMany({ + orderBy: { createdAt: "desc" }, + skip, + take: l, + }); + res.json({ + data: logs, + pagination: { + total, + page: p, + limit: l, + pages: Math.ceil(total / l), + } + }); + return; + } + + const logs = await db.log.findMany({ + orderBy: { createdAt: "desc" }, + take: 100, + }); + res.json(logs); + } catch (error) { + res.status(500).json({ error: "Internal server error" }); + } +}); + +// DELETE /logs +router.delete("/logs", async (req: Request, res: Response): Promise => { + try { + await db.log.deleteMany(); + res.status(204).end(); + } catch (error) { + res.status(500).json({ error: "Internal server error" }); + } +}); + export default router; diff --git a/apps/api/src/policy/rules/pathRule.ts b/apps/api/src/policy/rules/pathRule.ts new file mode 100644 index 0000000..77deff0 --- /dev/null +++ b/apps/api/src/policy/rules/pathRule.ts @@ -0,0 +1,214 @@ +import * as nodePath from "path"; +import * as fs from "fs"; +import { db } from "@repo/db"; +import type { RuleResult } from "../../../types.js"; +import { logger } from "../../../mcp/logger.js"; + +/** + * Resolves the real path of the closest existing ancestor of `p`. + * + * This is the same algorithm used by the file-manager-mcp sandbox utility: + * walk up the directory tree until we find a path that exists on disk, then + * call `fs.realpathSync` on that ancestor to canonicalise symlinks. We then + * re-append the remaining suffix so the result still points at the requested + * location, even if it does not exist yet (e.g. a new file about to be written). + */ +function getRealAncestor(p: string): string { + try { + return fs.realpathSync(p); + } catch (err: any) { + if (err.code === "ENOENT") { + const parent = nodePath.dirname(p); + // Guard against infinite recursion at filesystem root + if (parent === p) { + return p; + } + return nodePath.join(getRealAncestor(parent), nodePath.basename(p)); + } + throw err; + } +} + +/** + * Resolves the real (symlink-free) absolute path for a configured sandbox root. + * + * When the sandbox directory does not yet exist (e.g. on first boot), we + * resolve the real path of the closest existing ancestor instead so the check + * still works correctly. + */ +function resolveSandboxRoot(rawRoot: string): string { + if (fs.existsSync(rawRoot)) { + return fs.realpathSync(rawRoot); + } + const parent = nodePath.dirname(rawRoot); + const canonicalParent = fs.existsSync(parent) + ? fs.realpathSync(parent) + : parent; + return nodePath.resolve(canonicalParent, nodePath.basename(rawRoot)); +} + +/** + * Checks whether every path-like argument in the tool call stays within the + * `sandbox_path` configured on the policy record. + * + * Edge cases handled (mirrors file-manager-mcp/src/utils/sandbox.ts): + * + * 1. No `sandbox_path` on the policy → rule is skipped (returns allowed). + * 2. Empty string path argument → DENY (same as the MCP "Path is required" guard). + * 3. Argument prefixed with the sandbox directory name → strip the prefix + * before resolving (matches the "sandbox/" strip in validatePath). + * 4. Syntactic traversal (`../`) → caught by `path.relative` starting with "..". + * 5. Absolute path arguments that escape the root → caught by `path.isAbsolute(relative)`. + * 6. Symlink traversal → real ancestor of the resolved path is checked against + * the real sandbox root, so a symlink inside the sandbox pointing outside + * it is still caught. + * 7. Sandbox root that is itself a symlink → we resolve it with + * `resolveSandboxRoot` (matching the REAL_SANDBOX_ROOT logic in sandbox.ts). + * 8. DB errors → fail-closed: success:false so the engine denies execution. + * + * @param tool_name The tool being evaluated. + * @param args The raw arguments map from the agent step. + * @param preFetchedPolicy Optional pre-fetched policy object (avoids a second DB + * round-trip when the engine already queried it). + */ +export default async function withinSandboxPath( + tool_name: string, + args: Record, + preFetchedPolicy?: any, +): Promise> { + try { + const policy = + preFetchedPolicy !== undefined + ? preFetchedPolicy + : await db.policy.findUnique({ where: { tool_name } }); + + // No sandbox_path configured → this rule does not apply + if (!policy?.sandbox_path) { + return { success: true, result: false }; + } + + const rawRoot: string = policy.sandbox_path; + + // Resolve the real sandbox root (handles the root being a symlink itself) + const sandboxRoot = resolveSandboxRoot(rawRoot); + + const isPathKey = (key: string): boolean => { + // Split camelCase by inserting underscore before capital letters + const snakeCase = key.replace(/([a-z0-9])([A-Z])/g, "$1_$2").toLowerCase(); + // Split by non-alphanumeric characters to get individual words + const words = snakeCase.split(/[^a-z0-9]+/); + + const pathKeywords = ["path", "file", "dir", "folder", "src", "dest", "source", "destination", "filepath", "directory"]; + const exclusions = ["content", "text", "message", "body", "data", "code", "arguments", "args"]; + + // Check if any word matches path keywords + const hasPathKeyword = words.some(w => pathKeywords.includes(w)); + if (!hasPathKeyword) { + return false; + } + + // If the last word is an exclusion, or the key ends with an exclusion word, it is content + const lastWord = words[words.length - 1]; + if (lastWord && exclusions.includes(lastWord)) { + return false; + } + + // Check if the key ends with an exclusion suffix (handles cases like filenameContent without separators) + if (exclusions.some(exc => snakeCase.endsWith(exc) && !snakeCase.endsWith("path") && !snakeCase.endsWith("file") && !snakeCase.endsWith("dir"))) { + return false; + } + + return true; + }; + + const pathArgs = Object.entries(args) + .filter(([k, v]) => isPathKey(k) && typeof v === "string") + .map(([k, v]) => ({ key: k, value: v as string })); + + // If the tool takes no string arguments, there is nothing to check + if (pathArgs.length === 0) { + return { success: true, result: false }; + } + + const sandboxBasename = nodePath.basename(rawRoot); + + for (const { key, value } of pathArgs) { + // Edge-case 2: empty path argument + if (!value) { + logger.warn("Path argument is empty in pathRule", { tool_name, key }); + return { + success: true, + result: true, + reason: `Path argument '${key}' must not be empty`, + }; + } + + // Edge-case 3: strip leading "sandbox/" or "/" prefixes so + // agents using relative paths with the sandbox name still resolve correctly. + let cleanValue = value; + const prefixSlash = sandboxBasename + "/"; + const prefixBackslash = sandboxBasename + "\\"; + if (cleanValue.startsWith(prefixSlash)) { + cleanValue = cleanValue.substring(prefixSlash.length); + } else if (cleanValue.startsWith(prefixBackslash)) { + cleanValue = cleanValue.substring(prefixBackslash.length); + } + + // Resolve against the real sandbox root + const resolved = nodePath.resolve(sandboxRoot, cleanValue); + + // Edge-case 4 & 5: syntactic check — catches traversal and absolute + // paths that land outside the sandbox without touching the filesystem. + // Use exact segment boundaries to avoid false-positives on names like "..foo": + // a path escapes if any segment is exactly "..". + const relative = nodePath.relative(sandboxRoot, resolved); + const escapesViaDotDot = relative.split(nodePath.sep).some(seg => seg === ".."); + if (escapesViaDotDot || nodePath.isAbsolute(relative)) { + logger.warn("Path argument escapes sandbox (syntactic check)", { + tool_name, + key, + value, + sandbox_path: rawRoot, + }); + return { + success: true, + result: true, + reason: `Path argument '${key}' escapes the configured sandbox: ${rawRoot}`, + }; + } + + // Edge-case 6: symlink traversal — resolve real ancestor and re-check + const realResolved = getRealAncestor(resolved); + const realRelative = nodePath.relative(sandboxRoot, realResolved); + const realEscapesViaDotDot = realRelative.split(nodePath.sep).some(seg => seg === ".."); + if (realEscapesViaDotDot || nodePath.isAbsolute(realRelative)) { + logger.warn("Path argument escapes sandbox via symlink (real path check)", { + tool_name, + key, + value, + realResolved, + sandbox_path: rawRoot, + }); + return { + success: true, + result: true, + reason: `Path argument '${key}' escapes the configured sandbox: ${rawRoot}`, + }; + } + } + + // All path arguments are within the sandbox + return { success: true, result: false }; + } catch (error: any) { + logger.error("Database or filesystem error in withinSandboxPath rule", { + tool_name, + error_message: error instanceof Error ? error.message : String(error), + }); + + return { + success: false, + result: false, + reason: "Failed to evaluate path sandbox rule", + }; + } +} diff --git a/apps/api/types.ts b/apps/api/types.ts index d3387b6..11d5a38 100644 --- a/apps/api/types.ts +++ b/apps/api/types.ts @@ -70,12 +70,20 @@ export interface ToolCall { arguments: Record; } +export interface ToolCalls { + type: "tool_calls"; + tool_calls: { + tool_name: string; + arguments: Record; + }[]; +} + export interface FinalAnswer { type: "final_answer"; answer: string; } -export type AgentStep = ToolCall | FinalAnswer; +export type AgentStep = ToolCall | ToolCalls | FinalAnswer; export interface AgentResult { status: "SUCCESS" | "PENDING" | "DENY"; diff --git a/apps/api/verify.js b/apps/api/verify.js new file mode 100644 index 0000000..9769736 --- /dev/null +++ b/apps/api/verify.js @@ -0,0 +1,226 @@ +import fs from "fs"; +import path from "path"; +import { fileURLToPath } from "url"; + +const __filename = fileURLToPath(import.meta.url); +const __dirname = path.dirname(__filename); + +const API_URL = process.env.API_URL || "http://localhost:3001"; +// Resolved relative to this script — portable across any checkout location +const sandboxDir = path.resolve(__dirname, "../file-manager-mcp/sandbox"); + +async function runVerification() { + console.log("=== STARTING GATEKEEPER END-TO-END VERIFICATION (FETCH) ==="); + + const conversationId = `verify_conv_${Math.random().toString(36).substring(2, 9)}`; + console.log(`Using Conversation ID: ${conversationId}`); + + // Track pre-existing write_file policy so we can restore it exactly. + // null = no policy existed before this run (delete it on cleanup) + // obj = policy existed (restore its original action on cleanup) + let originalWriteFilePolicy = null; + let policyWasCreatedByVerify = false; + let hasFailed = false; + + // Track created sandbox files so we can delete them in finally + const createdFiles = []; + + try { + // 1. Check existing policies + console.log("\n[1] Fetching active policies..."); + const policiesRes = await fetch(`${API_URL}/policies`); + const policiesData = await policiesRes.json(); + console.log("Active policies count:", policiesData.length); + + // Remember any pre-existing write_file policy so cleanup can restore it + originalWriteFilePolicy = policiesData.find(p => p.tool_name === "write_file") || null; + if (originalWriteFilePolicy) { + console.log(`Pre-existing write_file policy found: action=${originalWriteFilePolicy.action}`); + } + + // 2. Set policy to APPROVAL so step 2 reliably produces PENDING regardless + // of what was in DB before the run. + console.log("\n[2] Ensuring write_file policy is APPROVAL for verification..."); + if (originalWriteFilePolicy) { + if (originalWriteFilePolicy.action !== "APPROVAL") { + const patchRes = await fetch(`${API_URL}/policies/write_file`, { + method: "PATCH", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ action: "APPROVAL" }), + }); + if (!patchRes.ok) { + throw new Error(`Failed to ensure write_file policy is APPROVAL (PATCH status: ${patchRes.status})`); + } + } + } else { + const postRes = await fetch(`${API_URL}/policies`, { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ tool_name: "write_file", action: "APPROVAL" }), + }); + if (postRes.ok) { + policyWasCreatedByVerify = true; + } else { + throw new Error(`Failed to create write_file policy (POST status: ${postRes.status})`); + } + } + + // 3. Run agent prompt that triggers write_file (should be paused for approval) + console.log("\n[3] Submitting prompt to write file (should be paused for approval)..."); + const run1Res = await fetch(`${API_URL}/agent/run`, { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ + message: "Write a file named sandbox/test.txt with content 'Hello GateKeeper'", + conversationId, + }), + }); + const run1Data = await run1Res.json(); + + console.log("Agent Run 1 Status:", run1Data.status); + console.log("Agent Run 1 Approval ID:", run1Data.approvalId); + + if (run1Data.status !== "PENDING" || !run1Data.approvalId) { + throw new Error(`Expected PENDING status and approval ID, got: ${JSON.stringify(run1Data)}`); + } + + const approvalId = run1Data.approvalId; + + // 4. Verify the approval exists in the approvals list + console.log("\n[4] Fetching approvals list..."); + const approvalsRes = await fetch(`${API_URL}/approvals`); + const approvalsData = await approvalsRes.json(); + const foundApproval = approvalsData.find(app => app.id === approvalId); + if (!foundApproval) { + throw new Error(`Approval ${approvalId} not found in GET /approvals`); + } + console.log("Found approval details in GET /approvals:", JSON.stringify(foundApproval)); + + // 5. Approve the request + console.log(`\n[5] Approving request ${approvalId}...`); + const approveRes = await fetch(`${API_URL}/policies/approvals/${approvalId}/approve`, { + method: "POST" + }); + const approveData = await approveRes.json(); + console.log("Approve response:", JSON.stringify(approveData)); + + // 6. Resume the agent run with the approvalId + console.log("\n[6] Resuming agent execution with approval..."); + const resumeRes = await fetch(`${API_URL}/agent/run`, { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ + message: null, + conversationId, + approvalId, + history: run1Data.history + }), + }); + const resumeData = await resumeRes.json(); + console.log("Resume status:", resumeData.status); + console.log("Resume final answer:", resumeData.answer); + + // Verify sandbox/test.txt exists and has the correct content + const testTxtPath = path.join(sandboxDir, "test.txt"); + createdFiles.push(testTxtPath); + if (!fs.existsSync(testTxtPath)) { + throw new Error(`File was not created at ${testTxtPath}`); + } + const fileContent = fs.readFileSync(testTxtPath, "utf-8"); + console.log(`File content at ${testTxtPath}: "${fileContent}"`); + if (fileContent !== "Hello GateKeeper") { + throw new Error(`Unexpected file content: ${fileContent}`); + } + console.log("File verification: SUCCESS"); + + // 7. Check decision logs + console.log("\n[7] Fetching decision logs..."); + const logsRes = await fetch(`${API_URL}/logs`); + const logsData = await logsRes.json(); + const runLogs = logsData.filter(log => log.reason && log.reason.includes(conversationId)); + console.log(`Logs generated for conversation ${conversationId}:`); + console.log(JSON.stringify(runLogs, null, 2)); + + // 8. Update policy to ALLOW and run again without approval + console.log("\n[8] Updating write_file policy to ALLOW..."); + await fetch(`${API_URL}/policies/write_file`, { + method: "PATCH", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ action: "ALLOW" }), + }); + + const conversationId2 = `verify_conv_auto_${Math.random().toString(36).substring(2, 9)}`; + console.log(`\n[9] Submitting prompt to write file with ALLOW policy (Conversation: ${conversationId2})...`); + const run2Res = await fetch(`${API_URL}/agent/run`, { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ + message: "Write a file named sandbox/allowed.txt with content 'Auto approved content'", + conversationId: conversationId2, + }), + }); + const run2Data = await run2Res.json(); + + console.log("Agent Run 2 Status:", run2Data.status); + console.log("Agent Run 2 Final Answer:", run2Data.answer); + + const allowedTxtPath = path.join(sandboxDir, "allowed.txt"); + createdFiles.push(allowedTxtPath); + if (!fs.existsSync(allowedTxtPath)) { + throw new Error(`File was not created at ${allowedTxtPath}`); + } + const allowedContent = fs.readFileSync(allowedTxtPath, "utf-8"); + console.log(`File content at ${allowedTxtPath}: "${allowedContent}"`); + + console.log("\n=== ALL E2E API VERIFICATIONS PASSED SUCCESSFULLY ==="); + } catch (error) { + console.error("\n!!! VERIFICATION FAILED !!!"); + console.error(error.message); + hasFailed = true; + } finally { + // Always clean up sandbox files and restore policy state, even on failure. + console.log("\n[cleanup] Removing created sandbox files..."); + for (const filePath of createdFiles) { + try { + if (fs.existsSync(filePath)) { + fs.unlinkSync(filePath); + console.log(` Deleted: ${filePath}`); + } + } catch (err) { + console.error(` Failed to delete ${filePath}:`, err.message); + } + } + + console.log("[cleanup] Restoring write_file policy state..."); + try { + if (policyWasCreatedByVerify) { + // We created it from scratch — delete it entirely to leave no trace + const res = await fetch(`${API_URL}/policies/write_file`, { method: "DELETE" }); + if (!res.ok) { + throw new Error(`DELETE /policies/write_file failed with status ${res.status}`); + } + console.log(" Deleted write_file policy (was created by verify)."); + } else if (originalWriteFilePolicy) { + // Restore the original action the policy had before the run + const res = await fetch(`${API_URL}/policies/write_file`, { + method: "PATCH", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ action: originalWriteFilePolicy.action }), + }); + if (!res.ok) { + throw new Error(`PATCH /policies/write_file failed with status ${res.status}`); + } + console.log(` Restored write_file policy to: ${originalWriteFilePolicy.action}`); + } + } catch (cleanupErr) { + console.error(" Policy cleanup failed:", cleanupErr.message); + hasFailed = true; + } + + if (hasFailed) { + process.exit(1); + } + } +} + +runVerification(); diff --git a/apps/file-manager-mcp/sandbox/docker_explanation.txt b/apps/file-manager-mcp/sandbox/docker_explanation.txt new file mode 100644 index 0000000..280fc8b --- /dev/null +++ b/apps/file-manager-mcp/sandbox/docker_explanation.txt @@ -0,0 +1 @@ +Docker is a containerization platform that allows developers to package, ship, and run applications in containers. Containers are lightweight and portable, providing a consistent and reliable way to deploy applications across different environments. \ No newline at end of file diff --git a/apps/web/README.md b/apps/web/README.md index a98bfa8..e37f7a2 100644 --- a/apps/web/README.md +++ b/apps/web/README.md @@ -1,36 +1,150 @@ -This is a [Next.js](https://nextjs.org) project bootstrapped with [`create-next-app`](https://nextjs.org/docs/app/api-reference/create-next-app). +# 🎨 Gatekeeper Admin Console: Frontend Architecture -## Getting Started +Welcome to the **Gatekeeper Admin Console**! This dashboard is the cockpit for your LLM security guardrail system. It is a sleek, off-black web application designed to let you chat with your LLM agent, manage safety policy rule-sets, review pending tool execution requests in real-time, and audit decision logs. -First, run the development server: +Built using **Next.js (App Router)** and **Tailwind CSS**, it features a fully reactive dashboard powered by **Redux Toolkit** and an automated, real-time polling synchronizer. -```bash -npm run dev -# or -yarn dev -# or -pnpm dev -# or -bun dev +--- + +## 🌟 Visual Philosophy & Premium Aesthetics + +We believe developer tools should look as premium as consumer applications. The dashboard is designed around a strict set of visual principles: + +* **The "Off-Black" Workspace**: Built with a dark, curated color scheme using deep charcoal tones (`zinc-950` backgrounds, `zinc-900` cards, and low-contrast `zinc-800` borders). This keeps it easy on the eyes during long debugging sessions. +* **Micro-Animations & Visual Cues**: When a tool is intercepted and requires human review, the UI draws attention through subtle animations (like soft pulsing amber rings) to flag execution blocks. +* **Shimmer Skeletons over Raw Loading Spinners**: Nobody likes jarring layout shifts or simple text placeholders. Every table component (Policies, Approvals, Logs) uses a CSS-gradient shimmer effect (`.shimmer` animation in [globals.css](app/globals.css)) that mimics the final grid structure while data is being loaded. +* **Utilitarian Typography**: The layout pairs clean, legible body text with monospaced accents (`Geist Mono`) for JSON representations, arguments, and tool names, emphasizing its nature as an operator's terminal. + +--- + +## 🏗️ Folder and Route Structure + +The Next.js workspace is organized as follows: + +```text +apps/web/ + ├── app/ # Next.js App Router Page Layouts + │ ├── layout.tsx # Base template, sets up global HTML, theme, and Redux provider wrapper + │ ├── page.tsx # Silent index page (immediately redirects to /chat) + │ ├── globals.css # Tailwind configuration and custom animation layers + │ ├── chat/ # The principal Agent Chat Workspace + │ ├── policies/ # Guardrail settings & policy rules editor + │ ├── approvals/ # Review queue for intercepted actions + │ └── logs/ # Immutably rendered audit trails + │ + ├── components/ # Reusable View Containers + │ ├── ChatWindow.tsx # Renders message streams, interactive tool blocks, and pending review banners + │ ├── PolicyTable.tsx # Curates security policies with inline creation/update actions and shimmer tables + │ ├── ApprovalTable.tsx # Intercepted tool parameters inspector with support for parallel batch runs + │ ├── LogsTable.tsx # Chronological evaluation logs display featuring "Reset Logs" capability + │ └── Navbar.tsx # High-level navigation bar with active route indicators + │ + ├── services/ # Axios API Client Wrappers + │ ├── agent.ts # Integrates with the express model executor loop (/agent/run) + │ ├── policies.ts # Performs policy mutations (GET, POST, PATCH, DELETE) + │ ├── approvals.ts # Submits POST requests to approve or reject pending blocks + │ └── logs.ts # Interacts with the decision audit records database + │ + └── store/ # Redux State Management Layer + ├── index.ts # Standard Redux store setup (TypeScript typed hooks) + ├── StoreProvider.tsx # Client-side wrapper to bridge Next.js Server Components + └── chatSlice.ts # Active conversation, input buffer, and pending state reducer ``` -Open [http://localhost:3000](http://localhost:3000) with your browser to see the result. +--- + +## 🧠 State Management: Session Preserves & SSR Hydration + +Initially, switching between tabs (e.g., leaving a running chat to edit a policy, then returning) would reset the chat history and inputs. To solve this, we moved the chat state into a centralized **Redux Toolkit** store. -You can start editing the page by modifying `app/page.tsx`. The page auto-updates as you edit the file. +### What is Persisted? +The store slice (`chatSlice.ts`) maintains: +1. **`conversationId`**: A unique session ID. If none is found, we automatically generate a random base-36 string. +2. **`messages`**: An array of chat bubbles (`ChatMessage[]`) representing the conversational trail. +3. **`inputValue`**: The draft text in the chat input. +4. **`loading`**: A flag indicating whether the agent is currently thinking or running tools. +5. **`pendingApprovalId`** / **`pendingToolName`** / **`pendingApprovalStatus`**: Intercepted action metadata. -This project uses [`next/font`](https://nextjs.org/docs/app/building-your-application/optimizing/fonts) to automatically optimize and load Inter, a custom Google Font. +### How Hydration is Handled Safely +Because Next.js pre-renders pages on the server (which doesn't have access to browser APIs like `window` or `localStorage`), initializing Redux state with local storage values immediately causes a hydration mismatch error. -## Learn More +To prevent this: +1. The slice starts with a safe, default initial state. +2. The chat page mounts a hook that dispatches the `hydrateChatState` action on mount. +3. This syncs `localStorage` variables back into the Redux store on the client, ensuring server-to-client rendering transitions are completely smooth. -To learn more about Next.js, take a look at the following resources: +--- + +## ⏱️ Real-time Automated Execution Flow + +The defining feature of the Gatekeeper Admin Console is its ability to resume execution automatically. If the agent hits a tool that requires human approval, you do not have to click "Resume" after approving it elsewhere. + +Here is the exact lifecycle of an execution resume event: + +```text + Chat Workspace (page.tsx) Admin approvals page (or separate tab) + │ │ + 1. Agent returns PENDING ────────┐ │ + (yellow card renders) │ │ + │ │ │ + 2. Starts Polling Loop ◄─────────┘ │ + (every 2 seconds) │ + │ │ + 3. Poll: getApprovals() │ + │ │ + ├───────────────── [Still PENDING] ────────────────────┤ + │ │ + │ 4. User clicks "Approve" + │ (State becomes APPROVED) + │ │ + 5. Poll: getApprovals() │ + (Detects status is APPROVED) │ + │ │ + 6. Stop polling interval │ + │ │ + 7. Call runAgentMessage() ────────────────────────────────────────► API resumes execution + (Passes approvalId to Express) +``` + +### 🔒 Preventing Double-Execution (Safe Refs Pattern) +Because React's `useEffect` and `setInterval` closures capture state at the time of creation, performing polling in React components can result in executing stale handlers (e.g. attempting to resume multiple times if the user clicks "Resume" at the exact millisecond the poller detects a state transition). + +We bypassed this by maintaining **synchronized state refs**: +```typescript +const handleApproveRef = useRef(handleApprove); +const loadingRef = useRef(loading); + +useEffect(() => { + handleApproveRef.current = handleApprove; + loadingRef.current = loading; +}, [handleApprove, loading]); +``` +The polling interval always queries `handleApproveRef.current()` and checks `loadingRef.current`. If the page is already executing or has finished, the action is gracefully ignored, protecting the Express backend from redundant execution pipelines. -- [Next.js Documentation](https://nextjs.org/docs) - learn about Next.js features and API. -- [Learn Next.js](https://nextjs.org/learn) - an interactive Next.js tutorial. +### 📦 Parallel Tool Batching in the UI +When the agent generates multiple tools in parallel (such as reading multiple configuration files simultaneously), the backend represents this as a single batched approval under the composite name `"multiple_tool_calls"`. +- The **Approval Page** and **Chat Workspace** catch this name and format the execution block as `"multiple parallel tools"`. +- The parameters are rendered in an clean, side-by-side array inspector so you can audit all requested parallel calls collectively. +- You can approve or reject the entire collection in a single click, triggering concurrent backend tool executions. -You can check out [the Next.js GitHub repository](https://github.com/vercel/next.js) - your feedback and contributions are welcome! +--- -## Deploy on Vercel +## 🚀 Local Development Setup -The easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template&filter=next.js&utm_source=create-next-app&utm_campaign=create-next-app-readme) from the creators of Next.js. +To run the Next.js developer environment: -Check out our [Next.js deployment documentation](https://nextjs.org/docs/app/building-your-application/deploying) for more details. +1. **Verify Backend Status**: Make sure your local API is running (usually on port `3001` or as specified by the `.env` file at the root). +2. **Install Workspace Dependencies**: + From the repository root directory: + ```bash + npm install + ``` +3. **Launch Web Dashboard**: + ```bash + npx turbo dev --filter=web + ``` + *Alternatively, navigate directly to `apps/web` and run:* + ```bash + npm run dev + ``` +4. **Open in Browser**: Navigate to [http://localhost:3000](http://localhost:3000). The console automatically connects to the API and initializes your Redux session. diff --git a/apps/web/app/approvals/page.tsx b/apps/web/app/approvals/page.tsx new file mode 100644 index 0000000..c69e488 --- /dev/null +++ b/apps/web/app/approvals/page.tsx @@ -0,0 +1,74 @@ +"use client"; + +import React, { useState, useEffect } from "react"; +import ApprovalTable from "../../components/ApprovalTable"; +import { getApprovals, approveRequest, rejectRequest, Approval } from "../../services/approvals"; + +export default function ApprovalsPage() { + const [approvals, setApprovals] = useState([]); + const [loading, setLoading] = useState(true); + + let cancelledRef = React.useRef(false); + + useEffect(() => { + cancelledRef.current = false; + return () => { + cancelledRef.current = true; + }; + }, []); + + const fetchApprovalsData = async () => { + try { + const data = await getApprovals(); + if (!cancelledRef.current) { + setApprovals(data); + } + } catch (err) { + console.error("Failed to fetch approvals", err); + } finally { + if (!cancelledRef.current) { + setLoading(false); + } + } + }; + + useEffect(() => { + let timeoutId: NodeJS.Timeout; + + const poll = async () => { + if (cancelledRef.current) return; + await fetchApprovalsData(); + if (!cancelledRef.current) { + timeoutId = setTimeout(poll, 5000); + } + }; + + poll(); + return () => { + if (timeoutId) { + clearTimeout(timeoutId); + } + }; + }, []); + + const handleApprove = async (id: string) => { + await approveRequest(id); + await fetchApprovalsData(); // refresh list + }; + + const handleReject = async (id: string) => { + await rejectRequest(id); + await fetchApprovalsData(); + }; + + return ( +
+ +
+ ); +} diff --git a/apps/web/app/chat/page.tsx b/apps/web/app/chat/page.tsx new file mode 100644 index 0000000..fe49da3 --- /dev/null +++ b/apps/web/app/chat/page.tsx @@ -0,0 +1,349 @@ +"use client"; + +import React, { useEffect, useRef } from "react"; +import { useSelector, useDispatch } from "react-redux"; +import ChatWindow from "../../components/ChatWindow"; +import { runAgentMessage, ChatMessage } from "../../services/agent"; +import { approveRequest, rejectRequest, getApprovals } from "../../services/approvals"; +import { RootState } from "../../store"; +import { + setMessages, + setInputValue, + setLoading, + setPendingApproval, + hydrateChatState, + clearChatState, +} from "../../store/chatSlice"; + +function isValidChatMessage(msg: any): msg is ChatMessage { + return ( + msg && + typeof msg === "object" && + (msg.role === "user" || msg.role === "assistant" || msg.role === "tool") && + typeof msg.content === "string" + ); +} + +function validateChatMessages(messages: any): ChatMessage[] { + if (!Array.isArray(messages)) return []; + return messages.filter(isValidChatMessage); +} + +export default function ChatPage() { + const dispatch = useDispatch(); + + const conversationId = useSelector((state: RootState) => state.chat.conversationId); + const messages = useSelector((state: RootState) => state.chat.messages); + const inputValue = useSelector((state: RootState) => state.chat.inputValue); + const loading = useSelector((state: RootState) => state.chat.loading); + const pendingApprovalId = useSelector((state: RootState) => state.chat.pendingApprovalId); + const pendingToolName = useSelector((state: RootState) => state.chat.pendingToolName); + const pendingApprovalStatus = useSelector((state: RootState) => state.chat.pendingApprovalStatus); + const isHydrated = useSelector((state: RootState) => state.chat.isHydrated); + + // Load state on mount if not hydrated + useEffect(() => { + if (!isHydrated) { + const savedConvId = localStorage.getItem("gatekeeper_conversationId"); + const savedMessages = localStorage.getItem("gatekeeper_messages"); + const savedPendingApprovalId = localStorage.getItem("gatekeeper_pendingApprovalId"); + const savedPendingToolName = localStorage.getItem("gatekeeper_pendingToolName"); + const rawStatus = localStorage.getItem("gatekeeper_pendingApprovalStatus"); + + const savedPendingApprovalStatus = (rawStatus === "PENDING" || rawStatus === "APPROVED" || rawStatus === "REJECTED") + ? (rawStatus as "PENDING" | "APPROVED" | "REJECTED") + : null; + + let parsedMessages: ChatMessage[] = []; + if (savedMessages) { + try { + const parsed = JSON.parse(savedMessages); + parsedMessages = validateChatMessages(parsed); + } catch (e) { + console.error("Error parsing saved messages", e); + } + } + + const conversationId = savedConvId || `conv_${Math.random().toString(36).substring(2, 9)}`; + + dispatch(hydrateChatState({ + conversationId, + messages: parsedMessages, + pendingApprovalId: savedPendingApprovalId, + pendingToolName: savedPendingToolName, + pendingApprovalStatus: savedPendingApprovalStatus, + })); + } + }, [dispatch, isHydrated]); + + const handleNewChat = () => { + if (confirm("Are you sure you want to clear the chat history and start a new session?")) { + const newId = `conv_${Math.random().toString(36).substring(2, 9)}`; + dispatch(clearChatState(newId)); + } + }; + + const handleSend = async () => { + if (!inputValue.trim() || loading) return; + + const userPrompt = inputValue; + dispatch(setInputValue("")); + dispatch(setLoading(true)); + + // If we were waiting for approval but user chose to type message instead, clear approval state + if (pendingApprovalId) { + dispatch(setPendingApproval({ id: null, toolName: null })); + } + + // Optimistically add user message to list + const updatedMessages = [...messages, { role: "user", content: userPrompt } as ChatMessage]; + dispatch(setMessages(updatedMessages)); + + try { + // Call agent endpoint + const res = await runAgentMessage( + userPrompt, + conversationId, + null, + messages // pass existing history + ); + + // Update messages list based on backend return + if (res.history) { + dispatch(setMessages(res.history)); + } else if (res.answer) { + dispatch(setMessages([...updatedMessages, { role: "assistant", content: res.answer! }])); + } + + if (res.status === "PENDING" && res.approvalId) { + let toolName = "requested_tool"; + const lastMsg = res.history[res.history.length - 1]; + if (lastMsg && lastMsg.content.includes("Call tool ")) { + const match = lastMsg.content.match(/Call tool (\w+)/); + toolName = (match && match[1]) || "requested_tool"; + } + dispatch(setPendingApproval({ id: res.approvalId, toolName })); + } else { + dispatch(setPendingApproval({ id: null, toolName: null })); + } + } catch (err: any) { + const errMsg = err.response?.data?.error || "An error occurred during execution."; + dispatch(setMessages([ + ...updatedMessages, + { role: "assistant", content: `Error: ${errMsg}` } as ChatMessage, + ])); + dispatch(setPendingApproval({ id: null, toolName: null })); + } finally { + dispatch(setLoading(false)); + } + }; + + const resumeAgentRun = async (approvalId: string, isApproval: boolean) => { + dispatch(setLoading(true)); + try { + const res = await runAgentMessage( + null, + conversationId, + approvalId, + messages + ); + + if (res.history) { + dispatch(setMessages(res.history)); + } else if (isApproval && res.answer) { + dispatch(setMessages([...messages, { role: "assistant", content: res.answer! }])); + } else if (!isApproval && res.reason) { + dispatch(setMessages([ + ...messages, + { role: "assistant", content: `Execution Rejected: ${res.reason}` }, + ])); + } + + if (isApproval && res.status === "PENDING" && res.approvalId) { + let toolName = "requested_tool"; + const lastMsg = res.history[res.history.length - 1]; + if (lastMsg && lastMsg.content.includes("Call tool ")) { + const match = lastMsg.content.match(/Call tool (\w+)/); + toolName = (match && match[1]) || "requested_tool"; + } + dispatch(setPendingApproval({ id: res.approvalId, toolName })); + } else { + dispatch(setPendingApproval({ id: null, toolName: null })); + } + } catch (err: any) { + const actionStr = isApproval ? "approve" : "reject"; + const errMsg = err.response?.data?.error || `An error occurred during execution resume after ${actionStr}.`; + dispatch(setMessages([ + ...messages, + { role: "assistant", content: `Error: ${errMsg}` } as ChatMessage, + ])); + dispatch(setPendingApproval({ id: null, toolName: null })); + } finally { + dispatch(setLoading(false)); + } + }; + + const handleApprove = async (approvalId: string) => { + dispatch(setLoading(true)); + try { + await approveRequest(approvalId); + await resumeAgentRun(approvalId, true); + } catch (err: any) { + const errMsg = err.response?.data?.error || "Failed to approve tool execution."; + dispatch(setMessages([ + ...messages, + { role: "assistant", content: `Error: ${errMsg}` } as ChatMessage, + ])); + dispatch(setPendingApproval({ id: null, toolName: null })); + dispatch(setLoading(false)); + } + }; + + const handleReject = async (approvalId: string) => { + dispatch(setLoading(true)); + try { + await rejectRequest(approvalId); + await resumeAgentRun(approvalId, false); + } catch (err: any) { + const errMsg = err.response?.data?.error || "Failed to reject execution."; + dispatch(setMessages([ + ...messages, + { role: "assistant", content: `Error: ${errMsg}` } as ChatMessage, + ])); + dispatch(setPendingApproval({ id: null, toolName: null })); + dispatch(setLoading(false)); + } + }; + + const handleResumeAfterApproval = async (approvalId: string) => { + await resumeAgentRun(approvalId, true); + }; + + const handleResumeAfterRejection = async (approvalId: string) => { + await resumeAgentRun(approvalId, false); + }; + + const handleApproveRef = useRef(handleApprove); + const handleRejectRef = useRef(handleReject); + const handleResumeAfterApprovalRef = useRef(handleResumeAfterApproval); + const handleResumeAfterRejectionRef = useRef(handleResumeAfterRejection); + const loadingRef = useRef(loading); + + useEffect(() => { + handleApproveRef.current = handleApprove; + handleRejectRef.current = handleReject; + handleResumeAfterApprovalRef.current = handleResumeAfterApproval; + handleResumeAfterRejectionRef.current = handleResumeAfterRejection; + loadingRef.current = loading; + }, [handleApprove, handleReject, handleResumeAfterApproval, handleResumeAfterRejection, loading]); + + // Polling approval status for real-time automatic execution resume/abort. + // isRunningRef is a synchronous guard that prevents concurrent checkStatus + // calls when the interval fires before loadingRef has been updated by React's + // render cycle (loadingRef syncs inside a useEffect, not synchronously). + const isRunningRef = useRef(false); + + useEffect(() => { + let intervalId: any; + + const checkStatus = async () => { + // P1 fix: check isRunningRef synchronously before the first await so the + // interval cannot fire a second overlapping call while the first is still + // awaiting getApprovals(), even if loadingRef hasn't updated yet. + if (!pendingApprovalId || loadingRef.current || isRunningRef.current) return; + isRunningRef.current = true; + try { + const list = await getApprovals(); + const match = list.find((item) => item.id === pendingApprovalId); + if (match) { + if (match.status === "APPROVED") { + clearInterval(intervalId); + handleResumeAfterApprovalRef.current(pendingApprovalId); + } else if (match.status === "REJECTED") { + clearInterval(intervalId); + handleResumeAfterRejectionRef.current(pendingApprovalId); + } else { + dispatch(setPendingApproval({ + id: pendingApprovalId, + toolName: pendingToolName, + status: match.status + })); + } + } else { + clearInterval(intervalId); + dispatch(setPendingApproval({ id: null, toolName: null, status: null })); + } + } catch (err) { + console.error("Failed to poll approval status", err); + } finally { + isRunningRef.current = false; + } + }; + + if (isHydrated && pendingApprovalId) { + checkStatus(); + intervalId = setInterval(checkStatus, 2000); + } + + return () => { + if (intervalId) clearInterval(intervalId); + }; + }, [dispatch, isHydrated, pendingApprovalId, pendingToolName]); + + if (!isHydrated) { + return ( +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ); + } + + return ( +
+
+
+

Agent Chat Workspace

+

Run the AI agent and review tool execution requests in real-time.

+
+ +
+ + dispatch(setInputValue(val))} + onSend={handleSend} + onApprove={handleApprove} + onReject={handleReject} + loading={loading} + pendingApprovalId={pendingApprovalId} + pendingToolName={pendingToolName} + pendingApprovalStatus={pendingApprovalStatus} + conversationId={conversationId} + /> +
+ ); +} diff --git a/apps/web/app/globals.css b/apps/web/app/globals.css index 6af7ecb..cdc79f5 100644 --- a/apps/web/app/globals.css +++ b/apps/web/app/globals.css @@ -1,50 +1,78 @@ -:root { - --background: #ffffff; - --foreground: #171717; -} +@tailwind base; +@tailwind components; +@tailwind utilities; -@media (prefers-color-scheme: dark) { +@layer base { :root { - --background: #0a0a0a; - --foreground: #ededed; + --background: 240 10% 3.9%; + --foreground: 0 0% 98%; + + --card: 240 10% 5.9%; + --card-foreground: 0 0% 98%; + + --popover: 240 10% 3.9%; + --popover-foreground: 0 0% 98%; + + --primary: 0 0% 98%; + --primary-foreground: 240 5.9% 10%; + + --secondary: 240 3.7% 15.9%; + --secondary-foreground: 0 0% 98%; + + --muted: 240 3.7% 15.9%; + --muted-foreground: 240 5% 64.9%; + + --accent: 240 3.7% 15.9%; + --accent-foreground: 0 0% 98%; + + --destructive: 0 62.8% 30.6%; + --destructive-foreground: 0 0% 98%; + + --border: 240 3.7% 15.9%; + --input: 240 3.7% 15.9%; + --ring: 240 4.9% 83.9%; + + --radius: 0.25rem; } } -html, -body { - max-width: 100vw; - overflow-x: hidden; +@layer base { + body { + background-color: rgb(9, 9, 11); + color: rgb(244, 244, 245); + font-family: var(--font-geist-sans), ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif; + } } -body { - color: var(--foreground); - background: var(--background); +/* Custom Scrollbars */ +::-webkit-scrollbar { + width: 6px; + height: 6px; } - -* { - box-sizing: border-box; - padding: 0; - margin: 0; +::-webkit-scrollbar-track { + background: transparent; } - -a { - color: inherit; - text-decoration: none; +::-webkit-scrollbar-thumb { + background: #27272a; + border-radius: 3px; } - -.imgDark { - display: none; +::-webkit-scrollbar-thumb:hover { + background: #3f3f46; } -@media (prefers-color-scheme: dark) { - html { - color-scheme: dark; +/* Shimmer Loading Animation */ +@keyframes shimmer { + 0% { + background-position: -200% 0; } - - .imgLight { - display: none; - } - .imgDark { - display: unset; + 100% { + background-position: 200% 0; } } + +.shimmer { + background: linear-gradient(90deg, #18181b 25%, #27272a 50%, #18181b 75%); + background-size: 200% 100%; + animation: shimmer 1.6s infinite linear; +} + diff --git a/apps/web/app/layout.tsx b/apps/web/app/layout.tsx index 8469537..27e233a 100644 --- a/apps/web/app/layout.tsx +++ b/apps/web/app/layout.tsx @@ -1,5 +1,7 @@ import type { Metadata } from "next"; import localFont from "next/font/local"; +import Navbar from "../components/Navbar"; +import StoreProvider from "../store/StoreProvider"; import "./globals.css"; const geistSans = localFont({ @@ -12,8 +14,8 @@ const geistMono = localFont({ }); export const metadata: Metadata = { - title: "Create Next App", - description: "Generated by create next app", + title: "GateKeeper Dashboard", + description: "Internal security operations dashboard for LLM agent control", }; export default function RootLayout({ @@ -22,9 +24,14 @@ export default function RootLayout({ children: React.ReactNode; }>) { return ( - - - {children} + + + + +
+ {children} +
+
); diff --git a/apps/web/app/logs/page.tsx b/apps/web/app/logs/page.tsx new file mode 100644 index 0000000..341cf9c --- /dev/null +++ b/apps/web/app/logs/page.tsx @@ -0,0 +1,70 @@ +"use client"; + +import React, { useState, useEffect } from "react"; +import LogsTable from "../../components/LogsTable"; +import { getLogs, resetLogs, Log } from "../../services/logs"; + +export default function LogsPage() { + const [logs, setLogs] = useState([]); + const [loading, setLoading] = useState(true); + + let cancelledRef = React.useRef(false); + + useEffect(() => { + cancelledRef.current = false; + return () => { + cancelledRef.current = true; + }; + }, []); + + const fetchLogsData = async () => { + try { + const data = await getLogs(); + if (!cancelledRef.current) { + setLogs(data); + } + } catch (err) { + console.error("Failed to fetch logs", err); + } finally { + if (!cancelledRef.current) { + setLoading(false); + } + } + }; + + useEffect(() => { + let timeoutId: NodeJS.Timeout; + + const poll = async () => { + if (cancelledRef.current) return; + await fetchLogsData(); + if (!cancelledRef.current) { + timeoutId = setTimeout(poll, 5000); + } + }; + + poll(); + return () => { + if (timeoutId) { + clearTimeout(timeoutId); + } + }; + }, []); + + const handleResetLogs = async () => { + if (confirm("Are you sure you want to clear all decision logs from the database?")) { + try { + await resetLogs(); + await fetchLogsData(); + } catch (err) { + alert("Failed to reset logs"); + } + } + }; + + return ( +
+ +
+ ); +} diff --git a/apps/web/app/page.module.css b/apps/web/app/page.module.css deleted file mode 100644 index 6108b60..0000000 --- a/apps/web/app/page.module.css +++ /dev/null @@ -1,186 +0,0 @@ -.page { - --gray-rgb: 0, 0, 0; - --gray-alpha-200: rgba(var(--gray-rgb), 0.08); - --gray-alpha-100: rgba(var(--gray-rgb), 0.05); - - --button-primary-hover: #383838; - --button-secondary-hover: #f2f2f2; - - display: grid; - grid-template-rows: 20px 1fr 20px; - align-items: center; - justify-items: center; - min-height: 100svh; - padding: 80px; - gap: 64px; - font-synthesis: none; -} - -@media (prefers-color-scheme: dark) { - .page { - --gray-rgb: 255, 255, 255; - --gray-alpha-200: rgba(var(--gray-rgb), 0.145); - --gray-alpha-100: rgba(var(--gray-rgb), 0.06); - - --button-primary-hover: #ccc; - --button-secondary-hover: #1a1a1a; - } -} - -.main { - display: flex; - flex-direction: column; - gap: 32px; - grid-row-start: 2; -} - -.main ol { - font-family: var(--font-geist-mono); - padding-left: 0; - margin: 0; - font-size: 14px; - line-height: 24px; - letter-spacing: -0.01em; - list-style-position: inside; -} - -.main li:not(:last-of-type) { - margin-bottom: 8px; -} - -.main code { - font-family: inherit; - background: var(--gray-alpha-100); - padding: 2px 4px; - border-radius: 4px; - font-weight: 600; -} - -.ctas { - display: flex; - gap: 16px; -} - -.ctas a { - appearance: none; - border-radius: 128px; - height: 48px; - padding: 0 20px; - font-family: var(--font-geist-sans); - border: 1px solid transparent; - transition: background 0.2s, color 0.2s, border-color 0.2s; - cursor: pointer; - display: flex; - align-items: center; - justify-content: center; - font-size: 16px; - line-height: 20px; - font-weight: 500; -} - -a.primary { - background: var(--foreground); - color: var(--background); - gap: 8px; -} - -a.secondary { - border-color: var(--gray-alpha-200); - min-width: 180px; -} - -button.secondary { - appearance: none; - border-radius: 128px; - height: 48px; - padding: 0 20px; - font-family: var(--font-geist-sans); - border: 1px solid transparent; - transition: background 0.2s, color 0.2s, border-color 0.2s; - cursor: pointer; - display: flex; - align-items: center; - justify-content: center; - font-size: 16px; - line-height: 20px; - font-weight: 500; - background: transparent; - border-color: var(--gray-alpha-200); - min-width: 180px; -} - -.footer { - font-family: var(--font-geist-sans); - grid-row-start: 3; - display: flex; - gap: 24px; -} - -.footer a { - display: flex; - align-items: center; - gap: 8px; -} - -.footer img { - flex-shrink: 0; -} - -/* Enable hover only on non-touch devices */ -@media (hover: hover) and (pointer: fine) { - a.primary:hover { - background: var(--button-primary-hover); - border-color: transparent; - } - - a.secondary:hover { - background: var(--button-secondary-hover); - border-color: transparent; - } - - .footer a:hover { - text-decoration: underline; - text-underline-offset: 4px; - } -} - -@media (max-width: 600px) { - .page { - padding: 32px; - padding-bottom: 80px; - } - - .main { - align-items: center; - } - - .main ol { - text-align: center; - } - - .ctas { - flex-direction: column; - } - - .ctas a { - font-size: 14px; - height: 40px; - padding: 0 16px; - } - - a.secondary { - min-width: auto; - } - - .footer { - flex-wrap: wrap; - align-items: center; - justify-content: center; - } -} - -@media (prefers-color-scheme: dark) { - .logo { - filter: invert(); - } -} diff --git a/apps/web/app/page.tsx b/apps/web/app/page.tsx index 593833b..a05d817 100644 --- a/apps/web/app/page.tsx +++ b/apps/web/app/page.tsx @@ -1,102 +1,5 @@ -import Image, { type ImageProps } from "next/image"; -import { Button } from "@repo/ui/button"; -import styles from "./page.module.css"; - -type Props = Omit & { - srcLight: string; - srcDark: string; -}; - -const ThemeImage = (props: Props) => { - const { srcLight, srcDark, ...rest } = props; - - return ( - <> - - - - ); -}; - -export default function Home() { - return ( -
-
- -
    -
  1. - Get started by editing apps/web/app/page.tsx -
  2. -
  3. Save and see your changes instantly.
  4. -
- - - -
- -
- ); -} +import { redirect } from "next/navigation"; + + export default function Page() { + redirect("/chat"); + } diff --git a/apps/web/app/policies/page.tsx b/apps/web/app/policies/page.tsx new file mode 100644 index 0000000..084ec12 --- /dev/null +++ b/apps/web/app/policies/page.tsx @@ -0,0 +1,59 @@ +"use client"; + +import React, { useState, useEffect } from "react"; +import PolicyTable from "../../components/PolicyTable"; +import { getPolicies, createPolicy, updatePolicy, deletePolicy, Policy, PolicyAction, getMcpTools, McpTool } from "../../services/policies"; + +export default function PoliciesPage() { + const [policies, setPolicies] = useState([]); + const [mcpTools, setMcpTools] = useState([]); + const [loading, setLoading] = useState(true); + + const fetchData = async () => { + setLoading(true); + try { + const [policiesData, mcpToolsData] = await Promise.all([ + getPolicies(), + getMcpTools(), + ]); + setPolicies(policiesData); + setMcpTools(mcpToolsData); + } catch (err) { + console.error("Failed to fetch policies or mcp tools", err); + } finally { + setLoading(false); + } + }; + + useEffect(() => { + fetchData(); + }, []); + + const handleAddPolicy = async (toolName: string, action: PolicyAction) => { + await createPolicy(toolName, action); + await fetchData(); // refresh list + }; + + const handleUpdatePolicy = async (toolName: string, action: PolicyAction) => { + await updatePolicy(toolName, action); + await fetchData(); + }; + + const handleDeletePolicy = async (toolName: string) => { + await deletePolicy(toolName); + await fetchData(); + }; + + return ( +
+ +
+ ); +} diff --git a/apps/web/components/ApprovalTable.tsx b/apps/web/components/ApprovalTable.tsx new file mode 100644 index 0000000..f222c15 --- /dev/null +++ b/apps/web/components/ApprovalTable.tsx @@ -0,0 +1,203 @@ +"use client"; + +import React, { useState } from "react"; +import { Check, X, ShieldAlert } from "lucide-react"; +import { Approval } from "../services/approvals"; + +interface ApprovalTableProps { + approvals: Approval[]; + onApprove: (id: string) => Promise; + onReject: (id: string) => Promise; + loading: boolean; +} + +export default function ApprovalTable({ + approvals, + onApprove, + onReject, + loading, +}: ApprovalTableProps) { + const [actioningId, setActioningId] = useState(null); + const [error, setError] = useState(null); + const [expandedIds, setExpandedIds] = useState>({}); + + const toggleExpand = (id: string) => { + setExpandedIds(prev => ({ ...prev, [id]: !prev[id] })); + }; + + const handleAction = async (id: string, actionFn: (id: string) => Promise) => { + setActioningId(id); + setError(null); + try { + await actionFn(id); + } catch (err: any) { + setError(err.response?.data?.error || err.message || "Failed to complete action"); + } finally { + setActioningId(null); + } + }; + + const formatDateStr = (dateStr: string) => { + try { + const date = new Date(dateStr); + return date.toISOString().replace("T", " ").substring(0, 19); + } catch (e) { + return dateStr; + } + }; + + return ( +
+
+

Approvals

+

Track and respond to manual verification requests from your agent. Click a tool name to inspect parameters.

+
+ + {error && ( +
+
+ + {error} +
+ +
+ )} + +
+
+ + + + + + + + + + + + {loading ? ( + Array.from({ length: 3 }).map((_, i) => ( + + + + + + + + )) + ) : approvals.length === 0 ? ( + + + + ) : ( + approvals.map((app) => { + const isPending = app.status === "PENDING"; + const isExpanded = !!expandedIds[app.id]; + return ( + + + + + + + + + {isExpanded && ( + + + + )} + + ); + }) + )} + +
Approval IDTool NameStatusCreated AtActions
+
+
+
+
+
+
+
+
+
+
+
+
+
+ No approval requests found. +
{app.id} + + + + {app.status} + + {formatDateStr(app.createdAt)} + {isPending ? ( +
+ + +
+ ) : ( + Completed + )} +
+
+
Arguments Inspector
+ {app.tool_name === "multiple_tool_calls" ? ( +
+ {((app.arguments as any)?.tool_calls || []).map((tc: any, idx: number) => ( +
+
Tool: {tc.tool_name}
+
+                                        {JSON.stringify(tc.arguments, null, 2)}
+                                      
+
+ ))} +
+ ) : ( +
+                                  {JSON.stringify(app.arguments, null, 2)}
+                                
+ )} +
+
+
+
+
+ ); +} diff --git a/apps/web/components/ChatWindow.tsx b/apps/web/components/ChatWindow.tsx new file mode 100644 index 0000000..fcf1a84 --- /dev/null +++ b/apps/web/components/ChatWindow.tsx @@ -0,0 +1,285 @@ +"use client"; + +import React, { useRef, useEffect } from "react"; +import { Send, AlertTriangle, Play, XOctagon, CheckCircle } from "lucide-react"; +import { ChatMessage } from "../services/agent"; + +interface ChatWindowProps { + messages: ChatMessage[]; + inputValue: string; + setInputValue: (val: string) => void; + onSend: () => void; + onApprove: (approvalId: string) => void; + onReject: (approvalId: string) => void; + loading: boolean; + pendingApprovalId?: string | null; + pendingToolName?: string | null; + pendingApprovalStatus?: "PENDING" | "APPROVED" | "REJECTED" | null; + conversationId: string; +} + +export default function ChatWindow({ + messages, + inputValue, + setInputValue, + onSend, + onApprove, + onReject, + loading, + pendingApprovalId, + pendingToolName, + pendingApprovalStatus, + conversationId, +}: ChatWindowProps) { + const scrollRef = useRef(null); + + useEffect(() => { + if (scrollRef.current) { + scrollRef.current.scrollTop = scrollRef.current.scrollHeight; + } + }, [messages, pendingApprovalId]); + + const handleKeyDown = (e: React.KeyboardEvent) => { + if (e.key === "Enter" && !e.shiftKey) { + e.preventDefault(); + if (!loading && inputValue.trim()) { + onSend(); + } + } + }; + + // Helper to determine if a message is a tool trace + const parseToolTrace = (content: string) => { + if (content.startsWith("Call tool ") || content.startsWith("Result: ") || content.startsWith("Call parallel tools:")) { + return true; + } + try { + // Check if it's a JSON response from a tool execution + const parsed = JSON.parse(content); + if (parsed && typeof parsed === "object" && ("result" in parsed || "isError" in parsed)) { + return true; + } + } catch (e) {} + return false; + }; + + return ( +
+ {/* Left side: Conversation history */} +
+ {/* Header */} +
+ CONVERSATION: {conversationId} + {pendingApprovalId && ( + + PAUSED / AWAITING APPROVAL + + )} +
+ + {/* Messages */} +
+ {messages.length === 0 ? ( +
+ No messages yet. Send a prompt to start the agent. +
+ ) : ( + messages.map((msg, index) => { + const isUser = msg.role === "user"; + const isTool = msg.role === "tool" || (!isUser && parseToolTrace(msg.content)); + + if (isTool) { + return ( +
+
+ TOOL TRACE + {msg.role.toUpperCase()} +
+
{msg.content}
+
+ ); + } + + return ( +
+ + {isUser ? "User" : "Assistant"} + +
+ {msg.content} +
+
+ ); + }) + )} + + {/* Pending Approval Card */} + {pendingApprovalId && ( +
+
+ {pendingApprovalStatus === "APPROVED" ? ( + + ) : pendingApprovalStatus === "REJECTED" ? ( + + ) : ( + + )} + +
+

+ {pendingApprovalStatus === "APPROVED" + ? "Action Approved" + : pendingApprovalStatus === "REJECTED" + ? "Action Rejected" + : "Approval Required"} +

+

+ {pendingApprovalStatus === "APPROVED" + ? "The tool call for " + : pendingApprovalStatus === "REJECTED" + ? "The tool call for " + : "The agent requested to run tool "} + + {pendingToolName === "multiple_tool_calls" ? "multiple parallel tools" : (pendingToolName || "unknown_tool")} + + {pendingApprovalStatus === "APPROVED" + ? " has been approved. Execution can be resumed." + : pendingApprovalStatus === "REJECTED" + ? " has been rejected. Resuming will abort the execution." + : ". Execution is paused."} +

+

APPROVAL ID: {pendingApprovalId}

+
+
+ +
+ {pendingApprovalStatus === "APPROVED" ? ( + + ) : pendingApprovalStatus === "REJECTED" ? ( + + ) : ( + <> + + + + )} +
+
+ )} + + {loading && !pendingApprovalId && ( +
+ Assistant +
+
+
+
+
+
+
+
+ + Agent is executing... +
+
+
+ )} +
+
+ + {/* Right side: Input and settings */} +
+
+

Command Panel

+ +
+ +