Skip to content

Add Groq batch processing and structured output support#927

Open
xmarquez wants to merge 3 commits intotidyverse:mainfrom
xmarquez:feature/groq-batch
Open

Add Groq batch processing and structured output support#927
xmarquez wants to merge 3 commits intotidyverse:mainfrom
xmarquez:feature/groq-batch

Conversation

@xmarquez
Copy link
Copy Markdown

@xmarquez xmarquez commented Feb 14, 2026

Summary

  • Adds batch processing support for chat_groq() via the Groq Batch API (50% cost discount, 24h completion window)
  • Enables structured data extraction for Groq by adding additionalProperties: false recursively to JSON schemas (required by Groq's strict mode)
  • Fixes as_json(ProviderGroq, Turn) to handle ContentJson in assistant turns (previously assumed ContentText only, causing serialization failure on structured output follow-up turns)
  • Implements 6 batch methods on ProviderGroq, 3 helper functions, and add_additional_properties_false() schema helper

Closes #914 (Groq portion — Gemini batch is PR #926)

Key implementation details

  • Batch format mirrors OpenAI (JSONL with custom_id, method, url, body) but uses /v1/chat/completions endpoint
  • Groq requires .jsonl file extension on uploads (API infers purpose from extension)
  • batch_retrieve() handles empty output_file_id gracefully (Groq may put all results in error file)
  • Structured output only works with models that support json_schema (e.g., openai/gpt-oss-20b)
  • as_json(ProviderGroq, Turn) now checks content type: extracts @string or serializes @data for ContentJson, falls back to @text for ContentText

Test plan

  • 34 tests pass: unit tests for helpers, schema generation, Turn serialization (ContentJson + ContentText), batch status parsing, fixture-based test, and live integration tests
  • Structured data extraction test enabled (test_data_extraction with openai/gpt-oss-20b)
  • devtools::test(filter = "batch-chat") — no regressions
  • devtools::check() — 779 PASS, 102 SKIP, 1 pre-existing FAIL (flaky Groq tool calling test)
  • Live end-to-end test: batch_chat_structured() with 5 song lyric prompts completed in ~7 seconds
  • Live fixture generated from Groq API (4 state capital prompts, correct answers)

🤖 Generated with Claude Code

xmarquez and others added 3 commits February 15, 2026 08:33
- Add 6 batch methods on ProviderGroq (has_batch_support, batch_submit,
  batch_poll, batch_status, batch_retrieve, batch_result_turn)
- Add groq_upload_file, groq_download_file, groq_json_fallback helpers
- Update as_json(TypeObject) to add additionalProperties: false recursively
- Add as_json(TypeArray) with recursive additionalProperties: false
- Add add_additional_properties_false() helper for schema recursion
- Handle empty output_file_id in batch_retrieve (Groq quirk)
- Require .jsonl file extension for Groq batch uploads
- Remove "no structured data extraction" limitation from docs
- Add pre-recorded fixture file for deterministic testing
- 30 tests: unit, schema, batch status, fixture-based, integration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
as_json(ProviderGroq, Turn) assumed non-tool assistant content was
always ContentText, accessing @text directly. Structured output creates
ContentJson which lacks @text, causing serialization failures on
follow-up turns. Now checks content type and extracts string
appropriately.

Also enables the structured data extraction test that was blocked by
this bug, and adds unit tests for ContentJson/ContentText serialization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: add Google Gemini support batch_chat()

1 participant