feat: cli OpenAI-compatible API `response_format` support by markstur · Pull Request #884 · generative-computing/mellea

markstur · 2026-04-17T23:37:58Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes m serve OpenAI API structured output #824

feat: cli OpenAI-compatible API `response_format` support

   - Added `JsonSchemaFormat` model to represent JSON schema definitions
   - Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
   - Used field alias to avoid conflict with Pydantic's `schema` method

   - Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
   - Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
   - Modified `make_chat_endpoint()` to:
     - Parse `response_format` from requests
     - Convert `json_schema` type to Pydantic models using the utility function
     - Detect if the serve function accepts a `format` parameter using `inspect.signature()`
     - Pass the generated Pydantic model as `format=` parameter to serve functions that support it
     - Handle backward compatibility with serve functions that don't accept `format`
   - Added proper error handling for invalid schemas

   - Test json_schema format is converted to Pydantic model and passed to serve
   - Test json_object format doesn't pass a schema
   - Test text format doesn't pass a schema
   - Test error handling for missing json_schema field
   - Test error handling for invalid JSON schemas
   - Test backward compatibility with serve functions without format parameter
   - Test optional fields in JSON schemas

When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method

This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used

- Added `JsonSchemaFormat` model to represent JSON schema definitions - Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`) - Used field alias to avoid conflict with Pydantic's `schema` method - Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models - Updated `_build_model_options()` to exclude `response_format` from model options (handled separately) - Modified `make_chat_endpoint()` to: - Parse `response_format` from requests - Convert `json_schema` type to Pydantic models using the utility function - Detect if the serve function accepts a `format` parameter using `inspect.signature()` - Pass the generated Pydantic model as `format=` parameter to serve functions that support it - Handle backward compatibility with serve functions that don't accept `format` - Added proper error handling for invalid schemas - Test json_schema format is converted to Pydantic model and passed to serve - Test json_object format doesn't pass a schema - Test text format doesn't pass a schema - Test error handling for missing json_schema field - Test error handling for invalid JSON schemas - Test backward compatibility with serve functions without format parameter - Test optional fields in JSON schemas When a client sends a request with `response_format.type = "json_schema"`, the server: 1. Extracts the JSON schema from `response_format.json_schema.schema` 2. Dynamically creates a Pydantic model from the schema 3. Passes it as the `format=` parameter to the serve function 4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output. Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

github-actions · 2026-04-17T23:38:14Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

planetf1 · 2026-04-20T10:23:15Z

@@ -186,6 +289,7 @@ async def endpoint(request: ChatCompletionRequest):
                        created=created_timestamp,
                        stream_options=request.stream_options,
                        system_fingerprint=system_fingerprint,


The non-streaming path (return ChatCompletion just below, line 297) returns output.value without validating against format_model. Suggest adding before that return (also needs import json at the top and ValidationError added to the pydantic import):

if format_model is not None and output.value is not None: try: format_model.model_validate(json.loads(output.value)) except (json.JSONDecodeError, ValidationError) as e: return create_openai_error_response( status_code=400, message=f"Output does not match required schema: {e!s}", error_type="invalid_response_error", )

I believe OpenAI responses can return output that is not valid for a given schema if things like token limits are hit. Do we want to match that behavior? Or should we always error on our side if the format isn't met?

planetf1 · 2026-04-20T10:23:27Z

                    )
                    yield f"data: {chunk.model_dump_json()}\n\n"

+        # Validate format if format_model is provided


Validation runs after all content chunks are already sent (lines 68–106), so the error arrives after the client has consumed the data. A few options:

Buffer when format_model is set, validate, then stream or error before emitting anything.

Return a 400 upfront when stream=True + json_schema — simplest for now.

Keep post-hoc but document it — callers can pass format= to the backend for constrained decoding instead.

related to #891 right?

planetf1 · 2026-04-20T10:30:34Z

    )


+def _json_schema_to_pydantic(


this handles type, but will not handle enum, additionalProperties, nested types, array, $ref, allOf, anyOf

Suggest clarifying caveats in comments? or figuring out if any more validation is viable

planetf1 · 2026-04-20T10:31:40Z

+
+            # Check if serve function accepts format parameter
+            serve_sig = inspect.signature(module.serve)
+            accepts_format = "format" in serve_sig.parameters


cacheable/could be done up front? Here it's done in every request but won't change?

planetf1 · 2026-04-20T10:32:36Z

+    schema_: dict[str, Any] = Field(alias="schema")
+    """JSON Schema definition."""
+
+    strict: bool | None = None


not used? See related comment - more is needed to really be strict or at least clarify behaviour?

planetf1 · 2026-04-20T10:33:28Z

+            accepts_format = "format" in serve_sig.parameters
+
            # Detect if serve is async or sync and handle accordingly
            if inspect.iscoroutinefunction(module.serve):


similar (not identical) code is repeated multiple times - possible opportunity for making common - minor.

jakelorocco

I have a broader question that is touched on in my comments below: If we trust our backend provider to properly handle our structured output requests, why do we do any validation on our side? (Because module.serve might do something funky?)

jakelorocco · 2026-04-22T16:33:37Z

+                if accepts_format:
+                    output = await module.serve(
+                        input=request.messages,
+                        requirements=request.requirements,
+                        model_options=model_options,
+                        format=format_model,
+                    )
+                else:
+                    output = await module.serve(
+                        input=request.messages,
+                        requirements=request.requirements,
+                        model_options=model_options,
+                    )


Can these calls be combined? If format defaults to None, is the expectation that module.serve handles that differently? Does module.serve default to a different format value?

jakelorocco · 2026-04-22T16:36:02Z

@@ -186,6 +289,7 @@ async def endpoint(request: ChatCompletionRequest):
                        created=created_timestamp,
                        stream_options=request.stream_options,
                        system_fingerprint=system_fingerprint,


I believe OpenAI responses can return output that is not valid for a given schema if things like token limits are hit. Do we want to match that behavior? Or should we always error on our side if the format isn't met?

markstur added 3 commits April 17, 2026 13:03

feat: add response_format support in cli when streaming

79f327f

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

feat: cli response_format features adding doc examples

af7e905

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>

markstur requested a review from a team as a code owner April 17, 2026 23:37

markstur requested review from jakelorocco and planetf1 April 17, 2026 23:37

markstur changed the title ~~Issue 824~~ feat: cli OpenAI-compatible API response_format support Apr 17, 2026

github-actions Bot added the enhancement New feature or request label Apr 17, 2026

planetf1 reviewed Apr 20, 2026

View reviewed changes

jakelorocco reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: cli OpenAI-compatible API `response_format` support#884

feat: cli OpenAI-compatible API `response_format` support#884
markstur wants to merge 3 commits intogenerative-computing:mainfrom
markstur:issue_824

markstur commented Apr 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

planetf1 Apr 20, 2026

Uh oh!

jakelorocco Apr 22, 2026

Uh oh!

planetf1 Apr 20, 2026

Uh oh!

markstur Apr 22, 2026

Uh oh!

planetf1 Apr 20, 2026

Uh oh!

planetf1 Apr 20, 2026

Uh oh!

planetf1 Apr 20, 2026

Uh oh!

planetf1 Apr 20, 2026 •

edited

Loading

Uh oh!

jakelorocco left a comment

Uh oh!

jakelorocco Apr 22, 2026

Uh oh!

jakelorocco Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

markstur commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Attribution

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

markstur commented Apr 17, 2026 •

edited

Loading

planetf1 Apr 20, 2026 •

edited

Loading