Skip to content

mtmd: fix whisper audio tail truncation by exposing padded buffer to FFT#22770

Merged
ngxson merged 1 commit into
ggml-org:masterfrom
ServeurpersoCom:mtmd/fix-whisper-audio-tail-truncation
May 7, 2026
Merged

mtmd: fix whisper audio tail truncation by exposing padded buffer to FFT#22770
ngxson merged 1 commit into
ggml-org:masterfrom
ServeurpersoCom:mtmd/fix-whisper-audio-tail-truncation

Conversation

@ServeurpersoCom
Copy link
Copy Markdown
Contributor

Overview

Fixes the audio input >30s problem

Additional information

In log_mel_spectrogram, the whisper-style padding branch allocated samples_padded with a 30s silence tail but never reassigned samples and n_samples to point at it (unlike the no_padding and center_padding branches), so the pad never made it into the mel and the chunking loop dropped the final partial slice along with real audio.

Fixes #22591. Reproduced & tested with :

ggml-org/Qwen3-Omni-30B-A3B-Instruct-GGUF/
Qwen3-Omni-30B-A3B-Instruct-Q8_0.gguf
mmproj-Qwen3-Omni-30B-A3B-Instruct-Q8_0.gguf
mmproj-Qwen3-Omni-30B-A3B-Instruct-bf16.gguf

Testing

Speak for more than 30 seconds and give a random magic word at the end. The LLM must know it.

No patch

Bad

With patch

Good

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES Opus 4.7 + local MCP rootless disposable pod with shared GPU

@ServeurpersoCom ServeurpersoCom requested a review from a team as a code owner May 6, 2026 16:36
@ServeurpersoCom ServeurpersoCom requested a review from ngxson May 6, 2026 17:00
@ServeurpersoCom
Copy link
Copy Markdown
Contributor Author

cc @ngxson

Copy link
Copy Markdown
Contributor

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks!

@ngxson ngxson requested a review from a team May 7, 2026 10:41
@ngxson ngxson merged commit cc97e45 into ggml-org:master May 7, 2026
45 of 46 checks passed
cetarthoriphros pushed a commit to cetarthoriphros/llama.cpp that referenced this pull request May 9, 2026
meh pushed a commit to meh/llama.cpp that referenced this pull request May 10, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Qwen3-Omni doesn't process the last 30s block of audio input

3 participants