Skip to content

feat(agent) : agent built🎉#5

Merged
yb175 merged 7 commits into
mainfrom
agent
Jun 25, 2026
Merged

feat(agent) : agent built🎉#5
yb175 merged 7 commits into
mainfrom
agent

Conversation

@yb175

@yb175 yb175 commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Summary by cubic

Adds a policy-gated Gemini agent that plans and runs MCP tools with optional human approval. Ships /agent/run with strict validation, a Gemini client with request timeouts, and guardrails like a 30-iteration cap and 3-minute budget auto-reset.

  • New Features

    • Agent loop: JSON tool_call/final_answer, MCP discovery/exec, schema checks, token tracking, 30-iteration cap; resume enforces approvals (APPROVED runs, PENDING returns PENDING, others DENY).
    • LLM client: request timeout via AbortSignal.timeout with GEMINI_TIMEOUT_MS override; tests cover abort-on-timeout and invalid-timeout fallback.
    • Policy flow: ALLOW/PENDING/DENY with atomic approve/reject endpoints; clear 404/400 on invalid state.
    • HTTP POST /agent/run: validates inputs, requires message or approvalId, role allowlist for history (cap 100); returns updated history.
    • CLI: npm run cli for chat and approve/reject; per-session UUID conversationId; tool execution trace.
    • Tests: 13 agent scenarios plus approval endpoint behavior.
  • Migration

    • Add GEMINI_API_KEY to .env (auto-loaded via apps/api/src/utils/env); optionally set GEMINI_TIMEOUT_MS (default 30000).
    • Run the DB migration adding budget_reset_at to Conversation. Configure MCP tools.

Written for commit e2c1ab7. Summary will update on new commits.

Review in cubic

Greptile Summary

This PR introduces a policy-gated Gemini agent that plans and executes MCP tools with optional human approval, wiring it up via a new POST /agent/run endpoint and supporting approve/reject endpoints on the policy router.

  • Agent loop (loop.ts): 30-iteration cap, 3-minute budget auto-reset via budget_reset_at, approval resume path that checks DB status before reconstructing the tool call step, and accumulated-token tracking flushed to the DB on every exit path.
  • LLM client (llm.ts): Gemini gemini-2.5-flash call with configurable AbortSignal.timeout, JSON schema validation of tool arguments, and clear error types for malformed responses.
  • HTTP layer (index.ts, router.ts): /agent/run validates message, conversationId, approvalId (type + non-empty), and history (role allowlist, 100-item cap); approve/reject endpoints use atomic updateMany with status: PENDING condition to prevent double-transitions.

Confidence Score: 3/5

The core agent loop is not safe to merge without fixing the approval record lifecycle: an executor failure after approval leaves the user with no recovery path.

The approval resume path in loop.ts delegates the delete of the approval record to decide() (decision.ts), which removes it atomically before mcpExecutor.execute() is called. A transient MCP failure after the record is deleted leaves the client with a 500 response, no approvalId in the body, and no way to retry — the user must restart the full request and go through human approval again. This is a real, exercisable failure mode on the critical happy path of the new feature.

apps/api/src/agent/loop.ts — the ALLOW branch (lines 176–193) needs attention around executor failure handling after the approval record is deleted by decide().

Important Files Changed

Filename Overview
apps/api/src/agent/loop.ts New agent loop with iteration cap, budget auto-reset, and approval resume path — has a P1 where the approval record is deleted inside decide() before tool execution succeeds, making MCP executor failures non-retryable
apps/api/src/agent/llm.ts New Gemini LLM client with AbortSignal timeout, configurable via GEMINI_TIMEOUT_MS, with schema validation and clear error messages — straightforward and well-structured
apps/api/src/agent/memory.ts New in-memory conversation state container with typed message roles and approval tracking — clean, minimal, no issues
apps/api/src/index.ts Adds /agent/run endpoint with role allowlist, 100-item history cap, and approvalId type guard — input validation is thorough; routes through to loop.ts which has the P1
apps/api/src/policy/router.ts Adds approve/reject endpoints using atomic updateMany with correct 404/400 differentiation — implementation is correct but endpoints lack auth middleware
apps/api/src/agent/cli.ts Interactive CLI client with random UUID session ID, tool trace view, and inline approve/reject flow — no issues
apps/api/src/utils/env.ts Upward directory traversal to locate and load .env, stops at turbo.json monorepo root — clean replacement for inline dotenv.config()
packages/db/prisma/schema.prisma Adds budget_reset_at DateTime with default now() to Conversation model — straightforward schema addition with corresponding migration
apps/api/src/agent/agent.test.ts 13 scenarios covering tool call, final answer, PENDING/DENY/ALLOW decisions, iteration limit, budget reset, and approval resume flows — comprehensive coverage

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant API as POST /agent/run
    participant AgentLoop as runAgent
    participant LLM as Gemini LLM
    participant Policy as decide
    participant DB as Database
    participant MCP as mcpExecutor

    Client->>API: message + conversationId + history
    API->>AgentLoop: runAgent(message, conversationId)
    AgentLoop->>DB: upsert conversation (budget check/reset)
    AgentLoop->>MCP: discoverTools
    AgentLoop->>LLM: nextStep(memory, tools)
    LLM-->>AgentLoop: tool_call response
    AgentLoop->>Policy: decide(tool_name, args, conversationId)
    alt PENDING
        Policy->>DB: approval.create PENDING
        Policy-->>AgentLoop: PENDING + approvalId
        AgentLoop-->>Client: status PENDING + approvalId
        Client->>API: POST /policies/approvals/:id/approve
        API->>DB: updateMany PENDING to APPROVED
        Client->>API: resume with approvalId
        API->>AgentLoop: runAgent(null, conversationId, approvalId)
        AgentLoop->>DB: approval.findUnique check APPROVED
        AgentLoop->>Policy: decide with approvalId
        Policy->>DB: approval.delete
        Policy-->>AgentLoop: ALLOW
        AgentLoop->>MCP: execute tool
        MCP-->>AgentLoop: result
        AgentLoop->>LLM: nextStep with tool result
        LLM-->>AgentLoop: final_answer
        AgentLoop-->>Client: status SUCCESS + answer
    else ALLOW
        Policy-->>AgentLoop: ALLOW
        AgentLoop->>MCP: execute tool
        MCP-->>AgentLoop: result
        AgentLoop->>LLM: nextStep continues
    else DENY
        Policy-->>AgentLoop: DENY
        AgentLoop-->>Client: status DENY + reason
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant API as POST /agent/run
    participant AgentLoop as runAgent
    participant LLM as Gemini LLM
    participant Policy as decide
    participant DB as Database
    participant MCP as mcpExecutor

    Client->>API: message + conversationId + history
    API->>AgentLoop: runAgent(message, conversationId)
    AgentLoop->>DB: upsert conversation (budget check/reset)
    AgentLoop->>MCP: discoverTools
    AgentLoop->>LLM: nextStep(memory, tools)
    LLM-->>AgentLoop: tool_call response
    AgentLoop->>Policy: decide(tool_name, args, conversationId)
    alt PENDING
        Policy->>DB: approval.create PENDING
        Policy-->>AgentLoop: PENDING + approvalId
        AgentLoop-->>Client: status PENDING + approvalId
        Client->>API: POST /policies/approvals/:id/approve
        API->>DB: updateMany PENDING to APPROVED
        Client->>API: resume with approvalId
        API->>AgentLoop: runAgent(null, conversationId, approvalId)
        AgentLoop->>DB: approval.findUnique check APPROVED
        AgentLoop->>Policy: decide with approvalId
        Policy->>DB: approval.delete
        Policy-->>AgentLoop: ALLOW
        AgentLoop->>MCP: execute tool
        MCP-->>AgentLoop: result
        AgentLoop->>LLM: nextStep with tool result
        LLM-->>AgentLoop: final_answer
        AgentLoop-->>Client: status SUCCESS + answer
    else ALLOW
        Policy-->>AgentLoop: ALLOW
        AgentLoop->>MCP: execute tool
        MCP-->>AgentLoop: result
        AgentLoop->>LLM: nextStep continues
    else DENY
        Policy-->>AgentLoop: DENY
        AgentLoop-->>Client: status DENY + reason
    end
Loading

Comments Outside Diff (4)

  1. apps/api/src/policy/router.ts, line 1-2 (link)

    P2 The import { ApprovalStatus } declaration appears mid-file (after all route handler code) rather than at the top. While JS/TS hoisting means this is syntactically valid, it is non-standard and can confuse static analysis tools and readers alike.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. packages/db/prisma/dev.db, line 1 (link)

    P2 Development SQLite database committed to the repository

    packages/db/prisma/dev.db is a binary SQLite file that should not be tracked in git. Committing it bloats the repository history on every schema change and could expose local test data to anyone cloning the repo. Add *.db (or specifically prisma/dev.db) to .gitignore and run git rm --cached packages/db/prisma/dev.db to stop tracking it.

  3. apps/api/src/agent/cli.ts, line 374 (link)

    P2 Hardcoded conversationId shares state across all CLI sessions

    conversationId is always "cli-conversation-session". Every CLI invocation upserts or reuses the same conversation row in the database, meaning separate test runs share token budgets and conversation history. If two CLI instances run simultaneously they will also interleave their messages into the same conversation. Consider generating a random UUID per session (or accepting it as a CLI argument) to give each session an independent context.

  4. apps/api/src/policy/router.ts, line 1352-1375 (link)

    P1 security Unauthenticated approve/reject endpoints

    POST /policies/approvals/:id/approve and POST /policies/approvals/:id/reject have no authentication or authorization middleware. Any unauthenticated caller that can reach the API can approve an arbitrary pending tool execution, entirely bypassing the human-in-the-loop policy gate that is the core purpose of this feature. An attacker or misconfigured client just needs to know (or guess) a valid approvalId — which is a UUID, but IDs are often observable from prior API responses.

Reviews (5): Last reviewed commit: "fix(gemimi-timeout) : fixed timout issue" | Re-trigger Greptile

Comment thread apps/api/src/agent/loop.ts
Comment thread apps/api/src/index.ts
Comment thread apps/api/src/agent/loop.ts
Comment thread apps/api/src/agent/loop.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19 issues found across 13 files

Tip: instead of fixing issues one by one fix them all with cubic

Re-trigger cubic

Comment thread apps/api/src/agent/loop.ts Outdated
Comment thread apps/api/src/agent/loop.ts
Comment thread .env.sample
Comment thread apps/api/types.ts Outdated
Comment thread apps/api/src/index.ts Outdated
Comment thread apps/api/src/agent/llm.ts Outdated
Comment thread apps/api/src/agent/llm.ts Outdated
Comment thread apps/api/src/agent/loop.ts Outdated
Comment thread apps/api/types.ts Outdated
Comment thread apps/api/src/policy/router.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 12 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/db/prisma/schema.prisma">

<violation number="1" location="packages/db/prisma/schema.prisma:77">
P1: Schema change is missing a matching Prisma migration for the new `Conversation.budget_reset_at` column. Deployments using migrations will keep an older table shape and break `Conversation` reads/writes.</violation>
</file>

Tip: instead of fixing issues one by one fix them all with cubic

Re-trigger cubic

Comment thread packages/db/prisma/schema.prisma
Comment thread apps/api/src/index.ts Outdated
Comment thread apps/api/src/policy/policy.test.ts
Comment thread apps/api/src/index.ts
Comment thread apps/api/src/policy/router.ts
Comment thread apps/api/src/index.ts
Comment thread apps/api/src/agent/loop.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread apps/api/src/agent/loop.ts
@yb175

yb175 commented Jun 25, 2026

Copy link
Copy Markdown
Owner Author

@greptile-apps Is every thing fine and the pr is safe to merge ?

@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown

No, this PR is not safe to merge yet. There are several issues that need to be resolved first, ranging from a critical security vulnerability to correctness bugs:


🔴 Blocker — Must fix before merge

1. Unauthenticated approve/reject endpoints (apps/api/src/policy/router.ts)

POST /policies/approvals/:id/approve and /reject have zero authentication middleware. This completely defeats the purpose of the human-in-the-loop gate — any unauthenticated caller who knows or guesses an approvalId can approve arbitrary tool executions. The entire security model of this feature hinges on these endpoints being protected.

2. No mutual presence check for message + approvalId (apps/api/src/index.ts)

A request with neither message nor approvalId passes validation and triggers an LLM call, consuming tokens for an empty-task run. At minimum, require at least one of the two.

3. No LLM fetch timeout (apps/api/src/agent/llm.ts)

There is no timeout on the Gemini API call. A slow or hung upstream response will stall the agent loop indefinitely with no way to recover.


🟡 Should fix — Correctness issues

4. approvalId lost on PENDING resume (apps/api/src/agent/loop.ts)

When a tool hits PENDING, the approvalId for that tool call may not be returned or surfaced correctly for the client to use when resuming.

5. No status double-check on resume (apps/api/src/agent/loop.ts)

When resuming with an approvalId, the code should re-verify the approval status is still APPROVED at execution time (not just at the start of the resume path) to guard against race conditions.


🟢 Lower priority — Clean up before or after merge

6. dev.db committed to git (packages/db/prisma/dev.db) — add *.db to .gitignore and run git rm --cached packages/db/prisma/dev.db.

7. Hardcoded conversationId in CLI (apps/api/src/agent/cli.ts:374) — generate a UUID per session instead of always using "cli-conversation-session".

8. Out-of-order import (apps/api/src/policy/router.ts) — move import { ApprovalStatus } to the top of the file.


Summary: Items 1–3 are hard blockers. Item 1 in particular is a fundamental security flaw — the human-approval gate is the core value proposition of this feature, and it can be bypassed trivially without auth on the approve/reject endpoints.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread apps/api/src/agent/llm.ts Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread apps/api/src/agent/agent.test.ts
@yb175 yb175 merged commit fcc33a0 into main Jun 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant