feat(agent) : agent built🎉 by yb175 · Pull Request #5 · yb175/gate-keeper

yb175 · 2026-06-25T01:07:28Z

Summary by cubic

Adds a policy-gated Gemini agent that plans and runs MCP tools with optional human approval. Ships /agent/run with strict validation, a Gemini client with request timeouts, and guardrails like a 30-iteration cap and 3-minute budget auto-reset.

New Features
- Agent loop: JSON tool_call/final_answer, MCP discovery/exec, schema checks, token tracking, 30-iteration cap; resume enforces approvals (APPROVED runs, PENDING returns PENDING, others DENY).
- LLM client: request timeout via AbortSignal.timeout with GEMINI_TIMEOUT_MS override; tests cover abort-on-timeout and invalid-timeout fallback.
- Policy flow: ALLOW/PENDING/DENY with atomic approve/reject endpoints; clear 404/400 on invalid state.
- HTTP POST /agent/run: validates inputs, requires message or approvalId, role allowlist for history (cap 100); returns updated history.
- CLI: npm run cli for chat and approve/reject; per-session UUID conversationId; tool execution trace.
- Tests: 13 agent scenarios plus approval endpoint behavior.
Migration
- Add GEMINI_API_KEY to .env (auto-loaded via apps/api/src/utils/env); optionally set GEMINI_TIMEOUT_MS (default 30000).
- Run the DB migration adding budget_reset_at to Conversation. Configure MCP tools.

^{Written for commit e2c1ab7. Summary will update on new commits.}

Greptile Summary

This PR introduces a policy-gated Gemini agent that plans and executes MCP tools with optional human approval, wiring it up via a new POST /agent/run endpoint and supporting approve/reject endpoints on the policy router.

Agent loop (loop.ts): 30-iteration cap, 3-minute budget auto-reset via budget_reset_at, approval resume path that checks DB status before reconstructing the tool call step, and accumulated-token tracking flushed to the DB on every exit path.
LLM client (llm.ts): Gemini gemini-2.5-flash call with configurable AbortSignal.timeout, JSON schema validation of tool arguments, and clear error types for malformed responses.
HTTP layer (index.ts, router.ts): /agent/run validates message, conversationId, approvalId (type + non-empty), and history (role allowlist, 100-item cap); approve/reject endpoints use atomic updateMany with status: PENDING condition to prevent double-transitions.

Confidence Score: 3/5

The core agent loop is not safe to merge without fixing the approval record lifecycle: an executor failure after approval leaves the user with no recovery path.

The approval resume path in loop.ts delegates the delete of the approval record to decide() (decision.ts), which removes it atomically before mcpExecutor.execute() is called. A transient MCP failure after the record is deleted leaves the client with a 500 response, no approvalId in the body, and no way to retry — the user must restart the full request and go through human approval again. This is a real, exercisable failure mode on the critical happy path of the new feature.

apps/api/src/agent/loop.ts — the ALLOW branch (lines 176–193) needs attention around executor failure handling after the approval record is deleted by decide().

Important Files Changed

Filename	Overview
apps/api/src/agent/loop.ts	New agent loop with iteration cap, budget auto-reset, and approval resume path — has a P1 where the approval record is deleted inside decide() before tool execution succeeds, making MCP executor failures non-retryable
apps/api/src/agent/llm.ts	New Gemini LLM client with AbortSignal timeout, configurable via GEMINI_TIMEOUT_MS, with schema validation and clear error messages — straightforward and well-structured
apps/api/src/agent/memory.ts	New in-memory conversation state container with typed message roles and approval tracking — clean, minimal, no issues
apps/api/src/index.ts	Adds /agent/run endpoint with role allowlist, 100-item history cap, and approvalId type guard — input validation is thorough; routes through to loop.ts which has the P1
apps/api/src/policy/router.ts	Adds approve/reject endpoints using atomic updateMany with correct 404/400 differentiation — implementation is correct but endpoints lack auth middleware
apps/api/src/agent/cli.ts	Interactive CLI client with random UUID session ID, tool trace view, and inline approve/reject flow — no issues
apps/api/src/utils/env.ts	Upward directory traversal to locate and load .env, stops at turbo.json monorepo root — clean replacement for inline dotenv.config()
packages/db/prisma/schema.prisma	Adds budget_reset_at DateTime with default now() to Conversation model — straightforward schema addition with corresponding migration
apps/api/src/agent/agent.test.ts	13 scenarios covering tool call, final answer, PENDING/DENY/ALLOW decisions, iteration limit, budget reset, and approval resume flows — comprehensive coverage

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant API as POST /agent/run
    participant AgentLoop as runAgent
    participant LLM as Gemini LLM
    participant Policy as decide
    participant DB as Database
    participant MCP as mcpExecutor

    Client->>API: message + conversationId + history
    API->>AgentLoop: runAgent(message, conversationId)
    AgentLoop->>DB: upsert conversation (budget check/reset)
    AgentLoop->>MCP: discoverTools
    AgentLoop->>LLM: nextStep(memory, tools)
    LLM-->>AgentLoop: tool_call response
    AgentLoop->>Policy: decide(tool_name, args, conversationId)
    alt PENDING
        Policy->>DB: approval.create PENDING
        Policy-->>AgentLoop: PENDING + approvalId
        AgentLoop-->>Client: status PENDING + approvalId
        Client->>API: POST /policies/approvals/:id/approve
        API->>DB: updateMany PENDING to APPROVED
        Client->>API: resume with approvalId
        API->>AgentLoop: runAgent(null, conversationId, approvalId)
        AgentLoop->>DB: approval.findUnique check APPROVED
        AgentLoop->>Policy: decide with approvalId
        Policy->>DB: approval.delete
        Policy-->>AgentLoop: ALLOW
        AgentLoop->>MCP: execute tool
        MCP-->>AgentLoop: result
        AgentLoop->>LLM: nextStep with tool result
        LLM-->>AgentLoop: final_answer
        AgentLoop-->>Client: status SUCCESS + answer
    else ALLOW
        Policy-->>AgentLoop: ALLOW
        AgentLoop->>MCP: execute tool
        MCP-->>AgentLoop: result
        AgentLoop->>LLM: nextStep continues
    else DENY
        Policy-->>AgentLoop: DENY
        AgentLoop-->>Client: status DENY + reason
    end

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant API as POST /agent/run
    participant AgentLoop as runAgent
    participant LLM as Gemini LLM
    participant Policy as decide
    participant DB as Database
    participant MCP as mcpExecutor

    Client->>API: message + conversationId + history
    API->>AgentLoop: runAgent(message, conversationId)
    AgentLoop->>DB: upsert conversation (budget check/reset)
    AgentLoop->>MCP: discoverTools
    AgentLoop->>LLM: nextStep(memory, tools)
    LLM-->>AgentLoop: tool_call response
    AgentLoop->>Policy: decide(tool_name, args, conversationId)
    alt PENDING
        Policy->>DB: approval.create PENDING
        Policy-->>AgentLoop: PENDING + approvalId
        AgentLoop-->>Client: status PENDING + approvalId
        Client->>API: POST /policies/approvals/:id/approve
        API->>DB: updateMany PENDING to APPROVED
        Client->>API: resume with approvalId
        API->>AgentLoop: runAgent(null, conversationId, approvalId)
        AgentLoop->>DB: approval.findUnique check APPROVED
        AgentLoop->>Policy: decide with approvalId
        Policy->>DB: approval.delete
        Policy-->>AgentLoop: ALLOW
        AgentLoop->>MCP: execute tool
        MCP-->>AgentLoop: result
        AgentLoop->>LLM: nextStep with tool result
        LLM-->>AgentLoop: final_answer
        AgentLoop-->>Client: status SUCCESS + answer
    else ALLOW
        Policy-->>AgentLoop: ALLOW
        AgentLoop->>MCP: execute tool
        MCP-->>AgentLoop: result
        AgentLoop->>LLM: nextStep continues
    else DENY
        Policy-->>AgentLoop: DENY
        AgentLoop-->>Client: status DENY + reason
    end

Comments Outside Diff (4)

apps/api/src/policy/router.ts, line 1-2 (link)

The import { ApprovalStatus } declaration appears mid-file (after all route handler code) rather than at the top. While JS/TS hoisting means this is syntactically valid, it is non-standard and can confuse static analysis tools and readers alike.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
packages/db/prisma/dev.db, line 1 (link)

Development SQLite database committed to the repository

packages/db/prisma/dev.db is a binary SQLite file that should not be tracked in git. Committing it bloats the repository history on every schema change and could expose local test data to anyone cloning the repo. Add *.db (or specifically prisma/dev.db) to .gitignore and run git rm --cached packages/db/prisma/dev.db to stop tracking it.
apps/api/src/agent/cli.ts, line 374 (link)

Hardcoded conversationId shares state across all CLI sessions

conversationId is always "cli-conversation-session". Every CLI invocation upserts or reuses the same conversation row in the database, meaning separate test runs share token budgets and conversation history. If two CLI instances run simultaneously they will also interleave their messages into the same conversation. Consider generating a random UUID per session (or accepting it as a CLI argument) to give each session an independent context.
apps/api/src/policy/router.ts, line 1352-1375 (link)

Unauthenticated approve/reject endpoints

POST /policies/approvals/:id/approve and POST /policies/approvals/:id/reject have no authentication or authorization middleware. Any unauthenticated caller that can reach the API can approve an arbitrary pending tool execution, entirely bypassing the human-in-the-loop policy gate that is the core purpose of this feature. An attacker or misconfigured client just needs to know (or guess) a valid approvalId — which is a UUID, but IDs are often observable from prior API responses.

_{Reviews (5): Last reviewed commit: "fix(gemimi-timeout) : fixed timout issue" | Re-trigger Greptile}

cubic-dev-ai

19 issues found across 13 files

_{Tip: instead of fixing issues one by one fix them all with cubic

Re-trigger cubic}

cubic-dev-ai

5 issues found across 12 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/db/prisma/schema.prisma">

<violation number="1" location="packages/db/prisma/schema.prisma:77">
P1: Schema change is missing a matching Prisma migration for the new `Conversation.budget_reset_at` column. Deployments using migrations will keep an older table shape and break `Conversation` reads/writes.</violation>
</file>

_{Tip: instead of fixing issues one by one fix them all with cubic

Re-trigger cubic}

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic}

yb175 · 2026-06-25T02:06:46Z

@greptile-apps Is every thing fine and the pr is safe to merge ?

greptile-apps · 2026-06-25T02:07:38Z

No, this PR is not safe to merge yet. There are several issues that need to be resolved first, ranging from a critical security vulnerability to correctness bugs:

🔴 Blocker — Must fix before merge

1. Unauthenticated approve/reject endpoints (apps/api/src/policy/router.ts)

POST /policies/approvals/:id/approve and /reject have zero authentication middleware. This completely defeats the purpose of the human-in-the-loop gate — any unauthenticated caller who knows or guesses an approvalId can approve arbitrary tool executions. The entire security model of this feature hinges on these endpoints being protected.

2. No mutual presence check for message + approvalId (apps/api/src/index.ts)

A request with neither message nor approvalId passes validation and triggers an LLM call, consuming tokens for an empty-task run. At minimum, require at least one of the two.

3. No LLM fetch timeout (apps/api/src/agent/llm.ts)

There is no timeout on the Gemini API call. A slow or hung upstream response will stall the agent loop indefinitely with no way to recover.

🟡 Should fix — Correctness issues

4. approvalId lost on PENDING resume (apps/api/src/agent/loop.ts)

When a tool hits PENDING, the approvalId for that tool call may not be returned or surfaced correctly for the client to use when resuming.

5. No status double-check on resume (apps/api/src/agent/loop.ts)

When resuming with an approvalId, the code should re-verify the approval status is still APPROVED at execution time (not just at the start of the resume path) to guard against race conditions.

🟢 Lower priority — Clean up before or after merge

6. dev.db committed to git (packages/db/prisma/dev.db) — add *.db to .gitignore and run git rm --cached packages/db/prisma/dev.db.

7. Hardcoded conversationId in CLI (apps/api/src/agent/cli.ts:374) — generate a UUID per session instead of always using "cli-conversation-session".

8. Out-of-order import (apps/api/src/policy/router.ts) — move import { ApprovalStatus } to the top of the file.

Summary: Items 1–3 are hard blockers. Item 1 in particular is a fundamental security flaw — the human-approval gate is the core value proposition of this feature, and it can be bypassed trivially without auth on the approve/reject endpoints.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

cubic-dev-ai

1 issue found across 3 files (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic}

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic}

feat(agent) : agent built🎉

a6b4a9d

greptile-apps Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread apps/api/src/agent/loop.ts

Comment thread apps/api/src/index.ts

Comment thread apps/api/src/agent/loop.ts

Comment thread apps/api/src/agent/loop.ts

cubic-dev-ai Bot reviewed Jun 25, 2026

View reviewed changes

fix(review) : fixed all p1 and p2 reviews

c61b58f

cubic-dev-ai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread packages/db/prisma/schema.prisma

Comment thread apps/api/src/index.ts Outdated

Comment thread apps/api/src/policy/policy.test.ts

Comment thread apps/api/src/index.ts

Comment thread apps/api/src/policy/router.ts

greptile-apps Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread apps/api/src/index.ts

fix(review) : fixed reviews

a1290cd

greptile-apps Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread apps/api/src/agent/loop.ts

security : fixed approvalvulnerability

6b19d16

cubic-dev-ai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread apps/api/src/agent/loop.ts

fix(review) : fixed review

15fa766

fix(llm) : added llm timeout

d886d6c

cubic-dev-ai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread apps/api/src/agent/llm.ts Outdated

fix(gemimi-timeout) : fixed timout issue

e2c1ab7

cubic-dev-ai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread apps/api/src/agent/agent.test.ts

yb175 merged commit fcc33a0 into main Jun 25, 2026
3 checks passed

Conversation

yb175 commented Jun 25, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (4)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yb175 commented Jun 25, 2026

Uh oh!

greptile-apps Bot commented Jun 25, 2026

🔴 Blocker — Must fix before merge

🟡 Should fix — Correctness issues

🟢 Lower priority — Clean up before or after merge

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yb175 commented Jun 25, 2026 •

edited by greptile-apps Bot

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading