DWS: Add document-extraction-api skill for /extraction/parse#13
Merged
Conversation
Teach the DWS skill how to call the now-GA /extraction/parse endpoint: - scripts/parse.py — single primitive that accepts a local file plus mode and output_format, calls client.parse(), and writes the result. Modes: text (1 cr/pg), structure (1.5 cr/pg, default), understand (9 cr/pg), agentic (18 cr/pg). Output shapes: spatial elements or whole-document Markdown. Billed against extraction credits (separate from processor API credits). Prints usage summary after each run. - references/parse-output-filtering.md — new reference doc showing downstream consumption patterns after a single /parse call: reading- order plain text, table-to-grid projection, key-value dict, formula LaTeX, picture alt descriptions. Includes Python snippets and jq one-liners for each pattern. - references/script-catalog.md — adds parse.py entry under a new "Data Extraction" section with mode, cost, and output-shape summary. - SKILL.md — adds a Data Extraction section covering: what /parse is (document-understanding primitive, not per-element-type calls), mode selection table keyed to user intent, default of structure+spatial for ambiguous requests, invocation examples, downstream-consumption quick-ref, and pointer to parse-output-filtering.md. Also updates skill description and task-scripts list. Python client dependency: path-install of the local branch that adds client.parse() support (file:// URL in the uv inline script header).
…on-api skill
DWS Extract is a separate product from DWS Processor — different API key,
different credit pool, different billing. Splitting the parse primitive
into its own skill removes the conflation and lets agents pick the right
product upfront.
- New skill: plugins/nutrient-dws/skills/document-extraction-api
- parse.py + references/parse-output-filtering.md moved over via git mv
- SKILL.md focused on the Data Extraction product, mode/output table,
downstream consumption patterns, and the separate NUTRIENT_EXTRACT_API_KEY
- Local lib/common.py with create_client() that reads
NUTRIENT_EXTRACT_API_KEY (falls back to NUTRIENT_API_KEY for tenants on
global keys) and constructs NutrientClient(api_key=..., extract_api_key=...)
- Pinned to nutrient-dws>=3.1.0 in the script's PEP 723 metadata
- document-processor-api: removed the Data Extraction section, the parse.py
entry, and the parse-output-filtering reference map row. Cross-link to the
sibling skill in the frontmatter description and "When to use" section.
- AGENTS.md: advertise the new skill alongside the existing two.
- Fix latent bug in parse.py: was reading usage.dataExtractionCredits
(camelCase) but the API returns data_extraction_credits (snake_case), so
the credit-usage summary was silently skipped on every call. Confirmed
end-to-end via live smoke (6-page PDF, structure/spatial mode, 9 credits,
~46KB JSON, usage summary now prints correctly).
The split into document-extraction-api is purely additive — the processor skill doesn't need cross-links or trimming. Leave it untouched.
- references/parse-output-filtering.md: snake_case `data_extraction_credits` to match the actual response shape (it was camelCase in three places — the schema diagram, the Python snippet, and the prose note). Anyone following the reference's Python snippet would silently get nothing back. Verified against the live API. - scripts/lib/common.py: use `is None` instead of truthiness for the env var checks, so `export NUTRIENT_EXTRACT_API_KEY=` (explicit empty) is treated as a misconfiguration to surface, not as "fall back to the Processor key". Also drop helpers carried over from the sibling skill's common.py that this skill never uses (write_json_output, parse_csv, read_json_file, fix_negative_args). - scripts/parse.py: call assert_local_file() on `--input` so URL inputs produce a clear error message instead of leaking through to a misleading FileNotFoundError.
Add an inference principle that walks the request, filename, and intent to pick the cheapest mode that satisfies every floor — explicitly no clarifying questions to the user. Replace the vague "ask before large documents" prose with a concrete 200-credit confirmation threshold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The eval workspace it covered is local-only and doesn't need to be masked from a repo-level rule. Anything regenerated by future skill-creator runs lands untracked, the same as any other ephemeral local artefact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the duplicated schema walkthrough from the references doc and link to the canonical pages on nutrient.io instead. The reference now lists which tools we suggest for reshaping a `/parse` response (jq, json, pandas, a LaTeX renderer, a markdown parser) — rather than re-stating field shapes that are already authoritative upstream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The rule was overly prescriptive — there's no architectural reason the skill must stay single-script forever, and the sibling-skill boundary between data extraction and /build workflows is already implicit in the skill's purpose. Other rules in the section remain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HungKNguyen
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The Data Extraction API (
/extraction/parse) went GA. It's a single document-understanding primitive that returns either the full structural document model (typed elements with bounding boxes and reading order) or a whole-document Markdown string — the natural primitive for RAG indexing, form/invoice extraction, and layout-aware understanding.DWS Extract is a separate product from DWS Processor, with its own API key and credit pool. This PR adds a dedicated skill —
document-extraction-api— alongside the existingdocument-processor-api, rather than conflating both products under one skill. The processor skill is left untouched.Summary
New skill:
plugins/nutrient-dws/skills/document-extraction-api/SKILL.md— explains the product split and the dual credit pool. Defines a mode-selection principle: the agent decides from the request alone (no clarifying questions); walks request, filename, and intent cues in order; and picks the cheapest mode that satisfies every floor. Sets a 200-credit cost-confirmation threshold above which the agent surfaces an estimate to the operator before invoking.scripts/parse.py— single primitive accepting a local file plusmodeandoutput_format. Callsclient.parse(), writes the result, and prints extraction-credit usage. Modes:text(1 cr/pg),structure(1.5 cr/pg, default),understand(9 cr/pg),agentic(18 cr/pg). Output shapes:spatialelements or whole-documentmarkdown. Pinned tonutrient-dws>=3.1.0.scripts/lib/common.py—create_client()factory readingNUTRIENT_EXTRACT_API_KEY(falls back toNUTRIENT_API_KEYfor tenants on global keys); constructsNutrientClient(api_key=..., extract_api_key=...).references/parse-output-filtering.md— points at the canonical upstream docs for the/parseresponse schema (extract-document-elements, extract-markdown, processing-modes, coordinate-spaces) and lists the tools we suggest reaching for to reshape a response:jqfor filtering / projection, the stdlibjsonmodule for programmatic walks,pandasfor table-to-dataframe projection, any LaTeX renderer forformulaelements, and a standard Markdown parser for chunkingoutput.markdown.Top-level
AGENTS.mdadvertises the new skill alongside the existing two.Mode-decision capability sweep
To validate that the skill makes the right
(mode, output_format)call across the documented decision surface, I built a 9-eval matrix where each eval pairs an intent prompt with the(mode, output_format)the docs prescribe for that intent:text + markdownunderstand + markdown¹structure + spatial(default)understand + spatialunderstand + spatialagentic + spatialagentic + (spatial | markdown)text + markdownstructure + spatial¹ The docs prescribe
structure + markdown(1.5 cr/pg) for this case, but the server currently returns HTTP 500 on image-only PDFs for anystructure-mode call regardless of output format. Tracked separately.understand + markdown(9 cr/pg) is the cheapest working combination for the scanned-RAG intent today.Each eval was run twice — once with the skill present, once with a baseline subagent that has the same API access but no skill guidance. Sample input: the 6-page born-digital
sample.pdffromnutrient-dws-client-python/tests/data/, plus a rasterized-at-150-DPI variant with no text layer for the scanned-input eval.Round-1 results (18 parallel subagent runs):
(mode, output_format)on every probe.structure + markdown(1.5 cr/pg, overshoot), skill correctly walked down totext + markdown(1 cr/pg). 33% lower per-page cost on that workflow.The skill's measurable value-add is avoiding the "go look up the API, then enumerate the modes, then decide what's cheapest" detour that the baseline walks through on every invocation.
Verification
claude plugin validate .passes.uv run scripts/parse.py --helprenders correctly./extraction/parseendpoint: 18 parallel subagent runs across 9 mode-decision evals × 2 configurations (with / without the skill). All runs returned well-formed outputs with the(mode, output_format)the docs prescribe; credit-usage summary prints correctly from the Python client.