I have issues trying to make Llama.cpp work with streaming (latest version from repo, so recent changes merged) using Openai.
I basically get no output. While the prompt gets through and LLM is thinking, I get no output.
With tracing, I get spammed by this:
Couldn't deserialize SSE data as StreamingCompletionChunk: Error("data did not match any variant of untagged enum StreamingCompletionChunk", line: 0, column: 0) gen_ai.operation.name="invoke_agent" gen_ai.agent.name="Unnamed Agent" gen_ai.system_instructions="" gen_ai.prompt="Przetłumacz 没关系,我正失眠 na język polski" gen_ai.prompt="Przetłumacz 没关系,我正失眠 na język polski" gen_ai.operation.name="chat" gen_ai.agent.name="Unnamed Agent" gen_ai.system_instructions="" gen_ai.provider.name="openai" gen_ai.provider.name="openai" gen_ai.request.model="" gen_ai.request.model=""
When I change Openai to Mistral, I get the output, but it is not streaming.
I have issues trying to make Llama.cpp work with streaming (latest version from repo, so recent changes merged) using Openai.
I basically get no output. While the prompt gets through and LLM is thinking, I get no output.
With tracing, I get spammed by this:
Couldn't deserialize SSE data as StreamingCompletionChunk: Error("data did not match any variant of untagged enum StreamingCompletionChunk", line: 0, column: 0) gen_ai.operation.name="invoke_agent" gen_ai.agent.name="Unnamed Agent" gen_ai.system_instructions="" gen_ai.prompt="Przetłumacz 没关系,我正失眠 na język polski" gen_ai.prompt="Przetłumacz 没关系,我正失眠 na język polski" gen_ai.operation.name="chat" gen_ai.agent.name="Unnamed Agent" gen_ai.system_instructions="" gen_ai.provider.name="openai" gen_ai.provider.name="openai" gen_ai.request.model="" gen_ai.request.model=""When I change Openai to Mistral, I get the output, but it is not streaming.