Skip to content

bug: Llama.cpp streaming does not give output #1662

@morti86

Description

@morti86

I have issues trying to make Llama.cpp work with streaming (latest version from repo, so recent changes merged) using Openai.

I basically get no output. While the prompt gets through and LLM is thinking, I get no output.

With tracing, I get spammed by this:

Couldn't deserialize SSE data as StreamingCompletionChunk: Error("data did not match any variant of untagged enum StreamingCompletionChunk", line: 0, column: 0) gen_ai.operation.name="invoke_agent" gen_ai.agent.name="Unnamed Agent" gen_ai.system_instructions="" gen_ai.prompt="Przetłumacz 没关系,我正失眠 na język polski" gen_ai.prompt="Przetłumacz 没关系,我正失眠 na język polski" gen_ai.operation.name="chat" gen_ai.agent.name="Unnamed Agent" gen_ai.system_instructions="" gen_ai.provider.name="openai" gen_ai.provider.name="openai" gen_ai.request.model="" gen_ai.request.model=""

When I change Openai to Mistral, I get the output, but it is not streaming.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions