feat: cli OpenAI-compatible API response_format support#884
feat: cli OpenAI-compatible API response_format support#884markstur wants to merge 3 commits intogenerative-computing:mainfrom
response_format support#884Conversation
- Added `JsonSchemaFormat` model to represent JSON schema definitions
- Extended `ResponseFormat` to support `json_schema` type (in addition to existing `text` and `json_object`)
- Used field alias to avoid conflict with Pydantic's `schema` method
- Added `_json_schema_to_pydantic()` utility function to dynamically convert JSON schemas to Pydantic models
- Updated `_build_model_options()` to exclude `response_format` from model options (handled separately)
- Modified `make_chat_endpoint()` to:
- Parse `response_format` from requests
- Convert `json_schema` type to Pydantic models using the utility function
- Detect if the serve function accepts a `format` parameter using `inspect.signature()`
- Pass the generated Pydantic model as `format=` parameter to serve functions that support it
- Handle backward compatibility with serve functions that don't accept `format`
- Added proper error handling for invalid schemas
- Test json_schema format is converted to Pydantic model and passed to serve
- Test json_object format doesn't pass a schema
- Test text format doesn't pass a schema
- Test error handling for missing json_schema field
- Test error handling for invalid JSON schemas
- Test backward compatibility with serve functions without format parameter
- Test optional fields in JSON schemas
When a client sends a request with `response_format.type = "json_schema"`, the server:
1. Extracts the JSON schema from `response_format.json_schema.schema`
2. Dynamically creates a Pydantic model from the schema
3. Passes it as the `format=` parameter to the serve function
4. The serve function can then use this for constrained decoding via Mellea's `instruct()` method
This maps OpenAI's `response_format` API to Mellea's native `format=` parameter for structured output.
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
response_format support
| @@ -186,6 +289,7 @@ async def endpoint(request: ChatCompletionRequest): | |||
| created=created_timestamp, | |||
| stream_options=request.stream_options, | |||
| system_fingerprint=system_fingerprint, | |||
There was a problem hiding this comment.
The non-streaming path (return ChatCompletion just below, line 297) returns output.value without validating against format_model. Suggest adding before that return (also needs import json at the top and ValidationError added to the pydantic import):
if format_model is not None and output.value is not None:
try:
format_model.model_validate(json.loads(output.value))
except (json.JSONDecodeError, ValidationError) as e:
return create_openai_error_response(
status_code=400,
message=f"Output does not match required schema: {e!s}",
error_type="invalid_response_error",
)There was a problem hiding this comment.
I believe OpenAI responses can return output that is not valid for a given schema if things like token limits are hit. Do we want to match that behavior? Or should we always error on our side if the format isn't met?
| ) | ||
| yield f"data: {chunk.model_dump_json()}\n\n" | ||
|
|
||
| # Validate format if format_model is provided |
There was a problem hiding this comment.
Validation runs after all content chunks are already sent (lines 68–106), so the error arrives after the client has consumed the data. A few options:
- Buffer when
format_modelis set, validate, then stream or error before emitting anything. - Return a 400 upfront when
stream=True+json_schema— simplest for now. - Keep post-hoc but document it — callers can pass
format=to the backend for constrained decoding instead.
| ) | ||
|
|
||
|
|
||
| def _json_schema_to_pydantic( |
There was a problem hiding this comment.
this handles type, but will not handle enum, additionalProperties, nested types, array, $ref, allOf, anyOf
Suggest clarifying caveats in comments? or figuring out if any more validation is viable
|
|
||
| # Check if serve function accepts format parameter | ||
| serve_sig = inspect.signature(module.serve) | ||
| accepts_format = "format" in serve_sig.parameters |
There was a problem hiding this comment.
cacheable/could be done up front? Here it's done in every request but won't change?
| schema_: dict[str, Any] = Field(alias="schema") | ||
| """JSON Schema definition.""" | ||
|
|
||
| strict: bool | None = None |
There was a problem hiding this comment.
not used? See related comment - more is needed to really be strict or at least clarify behaviour?
| accepts_format = "format" in serve_sig.parameters | ||
|
|
||
| # Detect if serve is async or sync and handle accordingly | ||
| if inspect.iscoroutinefunction(module.serve): |
There was a problem hiding this comment.
similar (not identical) code is repeated multiple times - possible opportunity for making common - minor.
jakelorocco
left a comment
There was a problem hiding this comment.
I have a broader question that is touched on in my comments below: If we trust our backend provider to properly handle our structured output requests, why do we do any validation on our side? (Because module.serve might do something funky?)
| if accepts_format: | ||
| output = await module.serve( | ||
| input=request.messages, | ||
| requirements=request.requirements, | ||
| model_options=model_options, | ||
| format=format_model, | ||
| ) | ||
| else: | ||
| output = await module.serve( | ||
| input=request.messages, | ||
| requirements=request.requirements, | ||
| model_options=model_options, | ||
| ) |
There was a problem hiding this comment.
Can these calls be combined? If format defaults to None, is the expectation that module.serve handles that differently? Does module.serve default to a different format value?
| @@ -186,6 +289,7 @@ async def endpoint(request: ChatCompletionRequest): | |||
| created=created_timestamp, | |||
| stream_options=request.stream_options, | |||
| system_fingerprint=system_fingerprint, | |||
There was a problem hiding this comment.
I believe OpenAI responses can return output that is not valid for a given schema if things like token limits are hit. Do we want to match that behavior? Or should we always error on our side if the format isn't met?
Misc PR
Type of PR
Description
Testing
Attribution