bug: Llama.cpp streaming does not give output

I have issues trying to make Llama.cpp work with streaming (latest version from repo, so recent changes merged) using Openai.

I basically get no output. While the prompt gets through and LLM is thinking, I get no output.

With tracing, I get spammed by this:

`Couldn't deserialize SSE data as StreamingCompletionChunk: Error("data did not match any variant of untagged enum StreamingCompletionChunk", line: 0, column: 0) gen_ai.operation.name="invoke_agent" gen_ai.agent.name="Unnamed Agent" gen_ai.system_instructions="" gen_ai.prompt="Przetłumacz 没关系，我正失眠 na język polski" gen_ai.prompt="Przetłumacz 没关系，我正失眠 na język polski" gen_ai.operation.name="chat" gen_ai.agent.name="Unnamed Agent" gen_ai.system_instructions="" gen_ai.provider.name="openai" gen_ai.provider.name="openai" gen_ai.request.model="" gen_ai.request.model=""`

When I change Openai to Mistral, I get the output, but it is not streaming.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Llama.cpp streaming does not give output #1662

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: Llama.cpp streaming does not give output #1662

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions