Skip to content

luannamorim/mcp-forge

Repository files navigation

MCPForge

MCP server for Brazilian personal finance.

Python 3.11+ MCP Ruff License Tests Focus

A Brazilian personal-finance MCP server with OFX/CSV parsing, PT-BR categorization, and CPF/CNPJ validation — no bank APIs required.


Demo

Two chained tool calls in an MCP client:

User: Importei o OFX do Itaú de fevereiro. Quanto gastei com alimentação
      versus transporte este mês?

Claude: [parse_ofx("/tmp/itau_fev2025.ofx")]
        → 143 transactions, bank_hint: "itau", 0 errors

        [categorize_transaction × 143]
        → rules tier: 98 txs  |  llm tier: 45 txs

        Alimentação + Restaurantes: R$ 1.247,30 (34 transações)
          IFOOD*PEDIDO, PADARIA NOSSA SRA, MCDONALD'S SP...

        Transporte: R$ 389,10 (18 transações)
          UBER *TRIP, 99APP, METRÔ SP BILHETE...

        Total do mês: R$ 8.643,22 em 143 transações.

Every categorize_transaction call returns which tier classified it:

{
  "category": "transporte",
  "confidence": 1.0,
  "path": "rules"
}

path{rules, embeddings, llm} — the classification source is always explicit. A rules hit at confidence 1.0 means a regex matched; an LLM hit at 0.7 means the model was less certain. The caller decides what to do with that signal.


Why this exists

Brazilian bank exports are not standardized. Itaú splits PIX transactions across three OFX lines that must be merged by FITID prefix. Inter exports in cp1252 but labels the file with ENCODING:1252 — a value that ofxparse rejects with an UnboundLocalError. Nubank embeds proprietary metadata in the description field. BB omits payee CNPJ for certain transaction types. Existing tools either skip Brazilian formats entirely or paper over the quirks silently, giving you wrong transaction counts without saying so.

The Model Context Protocol (released by Anthropic in late 2024) has become the standard for connecting LLM clients — Claude Desktop, Cursor, IDE agents — to external tools and data. As of mid-2025, there are effectively zero published MCP servers targeting Brazilian financial workflows. Every BR developer building a personal-finance assistant re-implements OFX parsing from scratch. This project solves the problem once and publishes the solution as a reusable MCP server.


Key features

  • Bank-specific parsers with documented quirks — cp1252 encoding fix for Itaú/Inter, PIX-split detection for Itaú, header-fingerprint dialect detection for all 5 banks in CSV mode
  • Transparent categorizationpath: rules | llm on every response; no black box; clients see exactly why a category was assigned
  • PII masking before any LLM callutils/pii.py:mask_br_documents strips CPF/CNPJ from description text unconditionally before it leaves the process (LGPD alignment)
  • Decimal everywhere — monetary amounts are never float; rounding errors in downstream aggregations are impossible by construction
  • Offline-safe corevalidate_cpf, validate_cnpj, parse_ofx, parse_csv make zero network calls; only categorize_transaction and lookup_cnae reach the network

Quickstart

git clone <this repo> && cd mcp-forge
uv sync
export ANTHROPIC_API_KEY=sk-...   # only needed for categorize_transaction
uv run mcpforge                    # start MCP server on stdio
uv run pytest                      # 198 tests

Offline mode (no Anthropic key)

categorize_transaction can run fully offline against a local Ollama server with Phi-3.5. Selected automatically when ANTHROPIC_API_KEY is unset, or explicitly via MCPFORGE_LLM_BACKEND=ollama.

ollama pull phi3.5
uv sync --extra ollama
unset ANTHROPIC_API_KEY
uv run mcpforge

Override the Ollama host/model with MCPFORGE_OLLAMA_HOST and MCPFORGE_OLLAMA_MODEL. Boot- and call-timeouts: MCPFORGE_OLLAMA_HEALTH_TIMEOUT_S (default 2.0), MCPFORGE_OLLAMA_CALL_TIMEOUT_S (default 30.0).

OpenTelemetry metrics (optional)

Two instruments are emitted alongside the JSONL trace log when an OTel endpoint is configured:

  • mcpforge_tool_calls_total{tool, success} — counter per tool invocation
  • mcpforge_tool_latency_ms{tool} — histogram of tool latency
uv sync --extra otel
export OTEL_EXPORTER_OTLP_ENDPOINT=https://your-collector
export OTEL_SERVICE_NAME=mcpforge   # default: mcpforge
uv run mcpforge

Without the env var or extra, metric emission is a silent no-op; JSONL traces under logs/traces.jsonl remain the default observability surface.

Client config

Add to your MCP client config. Example for Claude Desktop (claude_desktop_config.json) — Cursor, Zed, and Windsurf follow the same command + args + cwd pattern:

{
  "mcpServers": {
    "mcpforge": {
      "command": "uv",
      "args": ["run", "mcpforge"],
      "cwd": "/path/to/mcp-forge"
    }
  }
}

Available tools

Tool What it does Network?
validate_cpf(cpf) Receita Federal mod-11 checksum No
validate_cnpj(cnpj) Receita Federal mod-11 checksum No
parse_ofx(file_path) Parse BR bank OFX; returns normalized transactions + bank_hint + per-tx error list No
parse_csv(file_path, bank_hint?) Parse BR bank CSV with auto dialect detection No
lookup_cnae(cnpj) Primary CNAE classification via BrasilAPI; cached 24 h Yes — BrasilAPI
categorize_transaction(description, amount?) Rules → LLM cascade; returns {category, confidence, path} Conditionally — Anthropic

All inputs validated with Pydantic v2. Errors return MCP-compliant error objects, not Python tracebacks.


Architecture

flowchart LR
    A["MCP Client\nClaude Desktop · Cursor · Zed · Windsurf"]
    A <-->|stdio| B["FastMCP Server"]

    B --> P["parsers/\nofx · csv"]
    B --> V["validators/\ncpf · cnpj · cnae"]
    B --> C["classifiers/\ncascade"]

    P --> Banks["bank adapters\nItaú · Bradesco · Nubank · Inter · BB"]
    V --> BrasilAPI["BrasilAPI\ncached 24 h"]
    C --> Rules["rules tier\nzero API cost"]
    C -.->|"roadmap"| Emb["embeddings tier\nBGE-m3"]
    C --> PII["pii.mask_br_documents"]
    PII --> Haiku["Anthropic Haiku 4.5"]
Loading

Single-process Python server on stdio transport — compatible with any standard MCP client (Claude Desktop, Cursor, Zed, Windsurf, and others), no proxy needed. Parsers are stateless. CNAE lookup carries an in-memory 24h TTL cache. Every tool call appends a structured trace to logs/traces.jsonl with latency, token counts, and cost in USD.


Differentiation

1. Brazilian vertical focus, not a translation layer

Bank-specific OFX/CSV quirks are first-class, not an afterthought:

  • parsers/ofx.py:_fix_encoding_header — remaps the non-standard ENCODING:1252 declaration (produced by Itaú, Inter, and others) to USASCII before ofxparse sees the file, then decodes the content as cp1252. Without this fix, these files throw UnboundLocalError mid-parse — the kind of failure you only discover when a user reports missing transactions.
  • parsers/itau.py — PIX transactions are detected via FITID prefix pattern; merge logic stub is in place for when real multi-line OFX samples are available (contributions welcome).
  • validators/cpf.py, validators/cnpj.py — Receita Federal mod-11 checksum, pure Python, sub-millisecond, no external service.
Full 25-category PT-BR taxonomy

alimentacao · restaurantes · transporte · combustivel · moradia · utilities · telecom · saude · farmacia · educacao · lazer · streaming · vestuario · viagens · beleza · pets · impostos · tarifas_bancarias · salario · investimentos · transferencias · saques · doacoes · presentes · outros · uncategorized

Defined in classifiers/taxonomy.py. Configurable taxonomy is a v2 feature — fixed taxonomy in v1 so the golden dataset can be labeled before the cascade is tuned.

2. Categorization that shows its work

Most transaction categorizers return a label — you trust it or you don't. MCPForge returns path on every call, so the caller has enough information to decide how to handle uncertainty:

path: "rules"      → regex matched; confidence is 1.0; zero API cost
path: "llm"        → rules missed; Haiku was called; confidence reflects model certainty
path: "embeddings" → (roadmap) nearest-neighbor on labeled set; no LLM cost

An agent can surface an llm/0.6 hit to the user for confirmation while silently accepting a rules/1.0 hit. The classification source is transparent because the distinction matters: a wrong category from a regex miss is a different kind of bug than a wrong category from an LLM that had no good match.

Current state: the cascade is two-tier (rules → LLM). The embeddings tier is stubbed at cascade.py:42 — it slots in without changing the tool signature or path semantics.

3. Honest evaluation, not headline numbers

Per-category accuracy matters more than overall accuracy. A system with 87% top-1 can have 42% F1 on impostos — common in BR workflows with DARF, IPTU, and IPVA payments — and the aggregate hides it entirely.

The plan: Inspect AI running in CI on a 500-transaction labeled golden set (balanced across all 5 banks and all 25 categories), with per-category F1 published alongside overall accuracy, and a PR gate that blocks on a >2pp regression from the accepted baseline. The golden set will be committed to the repo, not stored separately.

Current state: the eval harness is not yet built. The numbers in Benchmarks are SPEC targets, not measurements. They will be replaced with measured results — including any categories that miss the target — when the harness lands.


Benchmarks / Evaluation

Targets (from SPEC.md — not yet measured)

Metric Target
Top-1 categorization accuracy ≥ 85%
parse_ofx p95 on a 1 MB file < 1.5 s
categorize_transaction p95 < 400 ms
LLM cost per transaction < $0.0005
OFX coverage 5/5 banks, ≥ 95% transactions parsed

These are calibrated estimates, not measured values. The eval harness is on the roadmap.

Status today

Subsystem State
Unit tests 181 passing (uv run pytest)
CSV parsing 5 banks, header-fingerprint dialect detection; fixtures in tests/parsers/fixtures/
OFX parsing Generic ofxparse + cp1252 quirk fix; Itaú PIX-split stub; bank hint detection
Categorizer eval Not yet — Inspect AI + 500-tx golden set on roadmap
Latency / cost benchmarks Not yet — per-call traces land in logs/traces.jsonl; no aggregated report

Targets will be replaced with measured numbers — including misses — when the eval harness lands.


Tech stack

Choice Why
Python 3.11+ mcp SDK is Python-first; StrEnum for the taxonomy
uv Reproducible, fast installs; all dev commands are uv run …
FastMCP (mcp ≥ 1.9) Official SDK; stdio transport = compatible with any standard MCP client
Pydantic v2 Every tool input/output is a model — MCP spec compliance and input validation in one layer
ofxparse + encoding fix Cheaper than rolling a custom SGML parser; the cp1252 quirk is a 10-line patch
chardet OFX file extensions lie about encoding; content is sniffed
Anthropic Haiku 4.5 Cheap and fast for short PT-BR text classification; ~$0.0005/tx target
Decimal (stdlib) Money is never float
ruff Lint + format in one tool
pytest + pytest-asyncio 181 tests today

Roadmap

Done in v1:

  • Inspect AI eval harness (500-tx golden set, CI regression gate, mocked-LLM run)
  • Three-tier cascade — rules → embeddings (BGE-m3 k-NN) → LLM
  • Ollama offline backend (Phi-3.5) with bounded timeouts and startup health check
  • Stateless per-file duplicate annotation (Transaction.duplicate_of)
  • Optional OpenTelemetry metrics (mcpforge_tool_calls_total, mcpforge_tool_latency_ms)
  • docs/ARCHITECTURE.md with the full module map and design rationale

Open for v1.x:

  • Real-LLM eval baseline — current 100% in evals/baseline_run.json is against the deterministic mock in evals/_mock_llm.py. Running once against live Haiku 4.5 (cost ~$0.20–0.30 for the full 500-tx set) is what validates the SPEC ≥85% target. Procedure documented in evals/README.md.
  • Per-bank postprocess modules for Bradesco, Nubank, Inter, BB (blocked on samples — see Contributing) — today the generic OFX/CSV parser handles all five banks via header-fingerprint detection, but bank-specific quirks (PIX splits, embedded metadata, encoding edge cases) need real exports to extract and test against.
  • Itaú PIX-split merge (blocked on samples) — stub at parsers/itau.py, passthrough pending a real multi-line OFX sample.

Out of scope for v1 (v2 candidates):

  • Configurable taxonomy
  • Cross-import duplicate detection (stateful)
  • Investment account types (CDB, FIIs, Tesouro Direto)
  • .xlsx / PDF imports
  • CNAE LLM-generated summaries on top of BrasilAPI

Known limitations

  • File imports only. No Open Finance / direct bank API; Itaú/Bradesco API access requires institutional registration and Open Finance compliance — out of scope.
  • No PDF or .xlsx. OFX and CSV only.
  • Per-bank quirk fixes are Itaú-only. Bradesco/Nubank/Inter/BB are parsed by the generic OFX/CSV pipeline. Bank-hint detection identifies them correctly, but no bank-specific postprocess runs — pending real anonymised statement samples (see Contributing). The Itaú PIX-split merge is also a passthrough stub for the same reason.
  • Eval accuracy is mock-LLM only. The 100% baseline in evals/baseline_run.json is against a deterministic keyword mock, not a live Anthropic call. A real-LLM run is needed before claiming the SPEC ≥85% target.
  • Single-process, single-user, local only. stdio transport; no multi-tenant or hosted deployment.
  • BrasilAPI has no paid fallback. If it's unavailable, lookup_cnae returns cnae_available: false and does not crash — but there is no alternative data source.

Contributing

Open an issue or PR. Before submitting, uv run pytest && uv run ruff check && uv run ruff format --check must be clean.

Highest-value contribution: anonymised OFX/CSV statement samples from any of the five supported banks. Specifically needed:

Bank What unlocks
Itaú Real OFX with PIX → implement the multi-line merge in parsers/itau.py
Bradesco Any OFX/CSV → create parsers/bradesco.py if quirks justify it
Nubank CSV with proprietary description metadata → parsers/nubank.py
Inter OFX with cp1252/ISO-8859 edge cases → confirm encoding fix coverage
BB OFX missing payee CNPJ on specific transaction types → parsers/bb.py

Anonymise by masking CPFs/CNPJs (the repo has mask_br_documents if useful), replacing amounts with synthetic values, and clearing personal names/addresses. Structure and field shapes are what matter for fixture-based tests.


License

MIT — see LICENSE.

About

MCP server for Brazilian personal finance. OFX/CSV parsing with per-bank quirk fixes, PT-BR transaction categorization (rules → LLM cascade), CPF/CNPJ validation, and CNAE lookup. Works with any MCP-compatible client — Claude Desktop, Cursor, Zed, Windsurf, and others.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages