ReqLLM

Join the community! Come chat about building AI tools with Elixir and coding Elixir with LLMs in The Swarm: Elixir AI Collective Discord server.

A Req- and Finch-backed package to call LLM APIs that standardizes requests and responses across providers.

Why Req LLM?

LLM APIs are inconsistent. ReqLLM provides a unified, idiomatic Elixir interface with standardized requests and responses across providers.

Unified architecture:

High-level API – Vercel AI SDK-inspired functions (generate_text/3, stream_text/3, generate_object/4 and more) that normalize requests and responses across providers.
Provider transports – Req powers request/response calls; Finch powers streaming. Provider callbacks translate model metadata, options, bodies, and responses behind the same public API.

Model Support Snapshot

ReqLLM currently exposes 1,205 models across 21 implemented provider integrations from LLMDB, the model catalog maintained through llm_db. Counting the cataloged-but-not-separate google_vertex_anthropic namespace, the registry contains 1,218 models across 22 provider namespaces.

That breadth extends well beyond chat: ReqLLM tracks 92 non-text operation models across embedding, image generation, text-to-speech, transcription, rerank, and OCR APIs. The fixture suite currently contains 619 unique recorded model specs, giving ReqLLM a compatibility ledger for text and multi-modal provider behavior.

Provider	ID	Catalog models	Operation surface	Recorded specs	Guide
Alibaba Cloud Bailian	`alibaba`	50	text, OCR 1, transcription 1	0	—
Alibaba Cloud Bailian (China)	`alibaba_cn`	82	text, OCR 1, transcription 1	0	—
Amazon Bedrock	`amazon_bedrock`	92	text, embedding 3	7	Guide
Anthropic	`anthropic`	11	text	11	Guide
Azure OpenAI	`azure`	103	text, embedding 6	26	Guide
Cerebras	`cerebras`	5	text	2	Guide
Cohere	`cohere`	17	text, rerank 5	5	—
ElevenLabs	`elevenlabs`	4	speech 4	4	—
Fireworks AI	`fireworks_ai`	12	text	12	Guide
Google Gemini	`google`	50	text, embedding 2, image 8	24	Guide
Google Vertex AI	`google_vertex`	40	text	11	Guide
Groq	`groq`	18	text, speech 2, transcription 2	11	Guide
MiniMax	`minimax`	6	text	6	—
OpenAI	`openai`	86	text, embedding 3, image 5, speech 6, transcription 7	64	Guide
OpenRouter	`openrouter`	364	text, embedding 25, image 5	234	Guide
Venice	`venice`	67	text	67	—
xAI	`xai`	26	text, image 3	21	Guide
Z.AI	`zai`	13	text	2	Guide
Z.AI Coder	`zai_coder`	5	text	1	Guide
Z.AI Coding Plan	`zai_coding_plan`	5	text	4	—
Zenmux	`zenmux`	149	text, image 2	107	Guide

* Streaming uses Finch directly due to known Req limitations with SSE responses.

Installation

Igniter Installation (Recommended)

The fastest way to get started is with Igniter:

mix igniter.install req_llm

Manual Installation

Add req_llm to your list of dependencies in mix.exs:

def deps do
  [
    {:req_llm, "~> 1.6"}
  ]
end

Then run:

mix deps.get

Quick Start

# Keys are picked up from .env files or environment variables - see `ReqLLM.Keys`
model = "anthropic:claude-haiku-4-5"

ReqLLM.generate_text!(model, "Hello world")
#=> "Hello! How can I assist you today?"

schema = [name: [type: :string, required: true], age: [type: :pos_integer]]
person = ReqLLM.generate_object!(model, "Generate a person", schema)
#=> %{name: "John Doe", age: 30}

{:ok, image_response} = ReqLLM.generate_image("openai:gpt-image-1.5", "A simple red square")
image_bytes = ReqLLM.Response.image_data(image_response)
File.write!("red_square.png", image_bytes)

Note: Google image models gemini-2.5-flash-image and gemini-3-pro-image-preview reject :n; specify the image count in the prompt.

{:ok, response} = ReqLLM.generate_text(
  model,
  ReqLLM.Context.new([
    ReqLLM.Context.system("You are a helpful coding assistant"),
    ReqLLM.Context.user("Explain recursion in Elixir")
  ]),
  temperature: 0.7,
  max_tokens: 200
)


{:ok, response} = ReqLLM.generate_text(
  model,
  "What's the weather in Paris?",
  tools: [
    ReqLLM.tool(
      name: "get_weather",
      description: "Get current weather for a location",
      parameter_schema: [
        location: [type: :string, required: true, doc: "City name"]
      ],
      callback: {Weather, :fetch_weather, [:extra, :args]}
    )
  ]
)

# Streaming text generation
{:ok, response} = ReqLLM.stream_text(model, "Write a short story")
ReqLLM.StreamResponse.tokens(response)
|> Stream.each(&IO.write/1)
|> Stream.run()

# Access usage metadata after streaming
usage = ReqLLM.StreamResponse.usage(response)

Features

Provider-agnostic model registry
- 21 implemented providers / 1,205 models sourced from LLMDB via the llm_db dependency
- Text, embedding, image generation, speech, transcription, rerank and OCR operation metadata
- Cost, context length, modality, capability and deprecation metadata included
Canonical data model
- Typed Context, Message, ContentPart, Tool, StreamChunk, Response, Usage
- Multi-modal content parts (text, image URL, tool call, binary)
- All structs implement Jason.Encoder for simple persistence / inspection
Unified client surface
- High-level Vercel-AI style helpers (generate_text/3, stream_text/3, generate_object/4, bang variants)
- Req-backed request/response calls and Finch-backed streaming behind the same provider abstraction
- Advanced Req request customization available for non-streaming use cases
Structured object generation
- generate_object/4 renders JSON-compatible Elixir maps validated by a NimbleOptions-compiled schema
- Zero-copy mapping to provider JSON-schema / function-calling endpoints
- OpenAI native structured outputs with three modes (:auto (default), :json_schema, :tool_strict)
Provider-specific capabilities
- Anthropic web search for real-time content access (via provider_options: [web_search: %{max_uses: 5}])
- Extended thinking/reasoning for supported models
- Prompt caching for cost optimization
- All provider-specific options documented in provider guides
Embedding generation
- Single or batch embeddings via Embedding.generate/3 (Not all providers support this)
- Automatic dimension / encoding validation and usage accounting
Production-grade streaming
- stream_text/3 returns a StreamResponse with both real-time tokens and async metadata
- Finch-based streaming with HTTP/2 multiplexing and automatic connection pooling
- OpenAI Responses models can opt into WebSocket mode with provider_options: [openai_stream_transport: :websocket]
- Concurrent metadata collection (usage, finish_reason) without blocking token flow
- Works uniformly across providers with internal SSE / chunked-response adaptation
Experimental OpenAI realtime sessions
- ReqLLM.OpenAI.Realtime exposes a low-level WebSocket session API for Realtime models
- Designed for explicit event-driven workflows that do not map cleanly to stream_text/3
Usage & cost tracking
- response.usage exposes normalized usage and best-effort USD cost from model metadata and provider response data
Schema-driven option validation
- All public APIs validate options with NimbleOptions; errors are raised as ReqLLM.Error.Invalid.* (Splode)
Automatic parameter translation & codecs
- Provider DSL translates canonical options (e.g. max_tokens -> max_completion_tokens for o1 & o3) to provider-specific names
- Built-in OpenAI-style encoding/decoding with provider callback overrides for custom formats
Flexible model specification
- Accepts "provider:model", tuples, %LLMDB.Model{} structs, and plain-map model specs
- ReqLLM.model!/1 is the recommended way to validate and normalize full model specs
Secure, layered key management (ReqLLM.Keys)
- Per-request override → application config → env vars / .env files
OAuth bearer auth for supported providers
- Direct access_token support for OpenAI and Anthropic
- OpenAI can load and refresh openai-codex credentials from oauth.json / auth.json
- openai_codex:* targets the ChatGPT Codex backend with OAuth-only auth and automatic account-id extraction
Extensive reliability tooling
- Fixture-backed test matrix (LiveFixture) supports cached, live, or provider-filtered runs
- Dialyzer, Credo strict rules, and no-comment enforcement keep code quality high

API Key Management

ReqLLM makes key management as easy and flexible as possible - this needs to just work.

Please submit a PR if your key management use case is not covered

Keys are pulled from multiple sources with clear precedence: per-request override → in-memory storage → application config → environment variables → .env files.

# Store keys in memory (recommended)
ReqLLM.put_key(:openai_api_key, "sk-...")
ReqLLM.put_key(:anthropic_api_key, "sk-ant-...")

# Retrieve keys with source info
{:ok, key, source} = ReqLLM.get_key(:openai)

All functions accept an api_key parameter to override the stored key:

ReqLLM.generate_text("anthropic:claude-haiku-4-5", "Hello", api_key: "sk-ant-...")
{:ok, response} = ReqLLM.stream_text("anthropic:claude-haiku-4-5", "Story", api_key: "sk-ant-...")

By default, ReqLLM loads .env files from the current working directory at startup. To disable this behavior (e.g., if you manage environment variables yourself):

config :req_llm, load_dotenv: false

Model Specs

ReqLLM can call models that are not in LLMDB yet. This is the recommended advanced workflow for local development, debugging new releases, and custom provider setups.

See the Model Specs guide for the full explanation of string specs, exact dated releases, %LLMDB.Model{} structs, and the full explicit model specification path.

For backwards compatibility, you can pass a plain map directly to the major APIs. The clearer path is to normalize it first with ReqLLM.model!/1, which returns an enriched %LLMDB.Model{}.

model =
  ReqLLM.model!(%{
    provider: :openai,
    id: "gpt-6-mini",
    base_url: "http://localhost:8000/v1"
  })

ReqLLM.generate_text!(model, "Hello world")

You can still pass the plain-map model spec directly:

ReqLLM.generate_text!(
  %{provider: :openai, id: "gpt-6-mini", base_url: "http://localhost:8000/v1"},
  "Hello world"
)

Use additional metadata only when the provider needs it:

model =
  ReqLLM.model!(%{
    provider: :google_vertex,
    id: "zai-org/glm-4.7-maas",
    extra: %{family: "glm"}
  })

ReqLLM hard-fails early when the model spec is missing required routing data, with errors aimed at advanced users:

Inline models always need provider and id (or model)
Azure still needs a base_url
Google Vertex MaaS models may need extra.family when the model family cannot be inferred

Usage Cost Tracking

Every response includes detailed usage and best-effort cost information calculated from normalized provider usage data plus model pricing metadata:

{:ok, response} = ReqLLM.generate_text("anthropic:claude-haiku-4-5", "Hello")

response.usage
#=> %{
#     input_tokens: 8,
#     output_tokens: 12,
#     total_tokens: 20,
#     input_cost: 0.00024,
#     output_cost: 0.00036,
#     total_cost: 0.0006
#   }

ReqLLM treats pricing as an observability and estimation feature, not an invoice guarantee. When provider billing accuracy matters, compare these values against your own provider-side reporting. See the Pricing Policy guide for the full contract and known limitations.

Tool & Image Usage

When using web search or generating images, additional usage metadata is available:

# Web search usage (Anthropic, OpenAI, xAI, Google)
{:ok, response} = ReqLLM.generate_text(model, prompt,
  provider_options: [web_search: %{max_uses: 5}])

response.usage.tool_usage
#=> %{web_search: %{count: 2, unit: "call"}}

response.usage.cost
#=> %{tokens: 0.001, tools: 0.02, images: 0.0, total: 0.021}

# Image generation usage
{:ok, response} = ReqLLM.generate_image("openai:gpt-image-1.5", prompt)

response.usage.image_usage
#=> %{generated: %{count: 1, size_class: "1024x1024"}}

A native ReqLLM telemetry surface is published for every request, including streaming:

[:req_llm, :request, :start | :stop | :exception] for lifecycle timing, summaries, and usage
[:req_llm, :reasoning, :start | :update | :stop] for standardized thinking and reasoning milestones
[:req_llm, :token_usage] for backwards-compatible token and cost measurements

All events share a request_id so you can correlate request lifecycle, reasoning lifecycle, and billing data across providers.

For OpenTelemetry, attach ReqLLM.OpenTelemetry once to emit GenAI client spans, optional GenAI metrics, cost attributes, and Langfuse-friendly message capture.

ReqLLM.OpenTelemetry.attach("req-llm-otel", content: :attributes, langfuse: true)

See examples/scripts/usage_cost_search_image.exs and run it from examples/ with mix run scripts/usage_cost_search_image.exs for a multi-provider smoke test that validates search tool and image generation cost metadata. For comprehensive documentation, see the Telemetry Guide and Usage & Billing Guide.

Streaming Configuration

ReqLLM uses Finch for streaming connections with automatic connection pooling. By default, we use HTTP/1-only pools to work around a known Finch bug with large request bodies:

# Default configuration (automatic)
config :req_llm,
  finch: [
    name: ReqLLM.Finch,
    pools: %{
      :default => [protocols: [:http1], size: 1, count: 8]
    }
  ]

HTTP/2 Configuration (Advanced)

Important: Due to Finch issue #265, HTTP/2 pools may fail when sending request bodies larger than 64KB (large prompts, extensive context windows). This is a bug in Finch's HTTP/2 flow control implementation, not a limitation of HTTP/2 itself.

If you want to use HTTP/2 pools (e.g., for performance testing or if you know your prompts are small), you can configure it:

# HTTP/2 configuration (use with caution)
config :req_llm,
  finch: [
    name: ReqLLM.Finch,
    pools: %{
      :default => [protocols: [:http2, :http1], size: 1, count: 8]
    }
  ]

ReqLLM will error with a helpful message if you try to send a large request body with HTTP/2 pools. The error will reference this section for configuration guidance.

For high-scale deployments with small prompts, you can increase the connection count:

# High-scale configuration
config :req_llm,
  finch: [
    name: ReqLLM.Finch,
    pools: %{
      :default => [protocols: [:http1], size: 1, count: 32]  # More connections
    }
  ]

Advanced users can specify custom Finch instances per request:

{:ok, response} = ReqLLM.stream_text(model, messages, finch_name: MyApp.CustomFinch)

StreamResponse Usage Patterns

The new StreamResponse provides flexible access patterns:

# Real-time streaming for UI
{:ok, response} = ReqLLM.stream_text(model, "Tell me a story")

ReqLLM.StreamResponse.tokens(response)
|> Stream.each(&broadcast_to_liveview/1)
|> Stream.run()

# Concurrent metadata collection (non-blocking)
Task.start(fn ->
  usage = ReqLLM.StreamResponse.usage(response)
  log_usage(usage)
end)

# Simple text collection
text = ReqLLM.StreamResponse.text(response)

# Backward compatibility with legacy Response
{:ok, legacy_response} = ReqLLM.StreamResponse.to_response(response)

Adding a Provider

ReqLLM uses OpenAI Chat Completions as the baseline API standard. Providers that support this format (like Groq, OpenRouter, xAI) require minimal overrides using the ReqLLM.Provider.DSL. Model metadata is automatically synced from LLMDB.

Providers implement the ReqLLM.Provider behavior with functions like encode_body/1, decode_response/1, and optional parameter translation via translate_options/3.

See the Adding a Provider Guide for detailed implementation instructions.

Advanced Req Plugin API

For advanced non-streaming use cases, you can use ReqLLM providers directly as Req plugins. This is the canonical implementation used by ReqLLM.generate_text/3:

# The canonical pattern from ReqLLM.Generation.generate_text/3
with {:ok, model} <- ReqLLM.model("anthropic:claude-haiku-4-5"), # Parse model spec
     {:ok, provider_module} <- ReqLLM.provider(model.provider),        # Get provider module
     {:ok, request} <- provider_module.prepare_request(:chat, model, "Hello!", temperature: 0.7), # Build Req request
     {:ok, %Req.Response{body: response}} <- Req.request(request) do   # Execute HTTP request
  {:ok, response}
end

# Customize the Req pipeline with additional headers or middleware
{:ok, model} = ReqLLM.model("anthropic:claude-haiku-4-5")
{:ok, provider_module} = ReqLLM.provider(model.provider)
{:ok, request} = provider_module.prepare_request(:chat, model, "Hello!", temperature: 0.7)

# Add custom headers or middleware before sending
custom_request =
  request
  |> Req.Request.put_header("x-request-id", "my-custom-id")
  |> Req.Request.put_header("x-source", "my-app")

{:ok, response} = Req.request(custom_request)

This approach gives you full control over the Req pipeline, allowing you to add custom middleware, modify requests, or integrate with existing Req-based applications. Streaming uses Finch through stream_text/3.

Documentation

Getting Started – first call and basic concepts
Configuration – timeouts, connection pools, and global settings
Telemetry – request lifecycle, reasoning lifecycle, payload capture
Core Concepts – architecture & data model
Data Structures – detailed type information
Pricing Policy – cost-calculation scope, guarantees, and known gaps
Usage & Billing – token costs, tool usage, image costs
Image Generation – generating images with OpenAI and Google
Mix Tasks – model sync, compatibility testing, code generation
Fixture Testing – model validation and supported models
Adding a Provider – extend with new providers
Provider Guides: Anthropic, OpenAI, Google, Google Vertex, xAI, Groq, OpenRouter, Amazon Bedrock, Azure, Cerebras, Fireworks AI, Z.AI, Z.AI Coder, Zenmux

Roadmap & Status

ReqLLM has now reached v1.0.0. The core API is stable and ready for production use. We're continuing to refine the library and would love community feedback as we plan the next set of improvements. If you run into anything or have suggestions, please open an issue or PR.

Test Coverage & Quality Commitment

ReqLLM uses fixture-backed compatibility tests as a practical map of provider behavior. The current suite includes 159 passing model-compat entries across 12 providers and 619 unique recorded fixture model specs across text, streaming, tool calling, structured output, embeddings, image generation, speech, transcription, rerank, and OCR.

Catalog support and fixture-verified coverage are tracked separately on purpose: provider catalogs move quickly, account access varies, and some modalities need specialized tests. ReqLLM makes that state visible through mix mc "*:*" and lets you narrow checks by provider or operation type when you need to validate the exact models your application uses.

We welcome bug reports and feedback! If you encounter issues with any supported model, please open a GitHub issue with details. The more feedback we receive, the stronger the code will be!

Development

# Install dependencies
mix deps.get

# Run tests with cached fixtures
mix test

# Run quality checks
mix quality  # format, compile, credo --strict, dialyzer

# Generate documentation
mix docs

Testing with Fixtures

Tests use cached JSON fixtures by default. To regenerate fixtures against live APIs (optional):

# Regenerate all fixtures
LIVE=true mix test

# Regenerate specific provider fixtures using test tags
LIVE=true mix test --only "provider:anthropic"

Contributing

We welcome contributions! ReqLLM uses a fixture-based testing approach to ensure reliability across all providers.

Please read CONTRIBUTING.md for detailed guidelines on:

Core library contributions
Adding new providers
Extending provider features
Testing requirements and fixture generation
Code quality standards

Quick start:

Fork the repository
Create a feature branch
Add tests with fixtures for your changes
Run mix test and mix quality to ensure standards
Verify mix mc "*:*" passes for affected providers
Submit a pull request

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 626 Commits
.agents/skills		.agents/skills
.github		.github
config		config
examples		examples
guides		guides
lib		lib
priv		priv
test		test
tutorial/agents		tutorial/agents
.credo.exs		.credo.exs
.env.example		.env.example
.formatter.exs		.formatter.exs
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock
usage-rules.md		usage-rules.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReqLLM

Why Req LLM?

Installation

Igniter Installation (Recommended)

Manual Installation

Quick Start

Features

API Key Management

Model Specs

Usage Cost Tracking

Tool & Image Usage

Streaming Configuration

HTTP/2 Configuration (Advanced)

StreamResponse Usage Patterns

Adding a Provider

Advanced Req Plugin API

Documentation

Roadmap & Status

Test Coverage & Quality Commitment

Development

Testing with Fixtures

Contributing

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ReqLLM

Why Req LLM?

Installation

Igniter Installation (Recommended)

Manual Installation

Quick Start

Features

API Key Management

Model Specs

Usage Cost Tracking

Tool & Image Usage

Streaming Configuration

HTTP/2 Configuration (Advanced)

StreamResponse Usage Patterns

Adding a Provider

Advanced Req Plugin API

Documentation

Roadmap & Status

Test Coverage & Quality Commitment

Development

Testing with Fixtures

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages