Skip to content

fix: stop sending audio to Omi backend when custom STT is active (prevents listening-minute consumption)#6634

Open
Rahulsharma0810 wants to merge 2 commits intoBasedHardware:mainfrom
Rahulsharma0810:feat/custom-stt-no-audio-leak
Open

fix: stop sending audio to Omi backend when custom STT is active (prevents listening-minute consumption)#6634
Rahulsharma0810 wants to merge 2 commits intoBasedHardware:mainfrom
Rahulsharma0810:feat/custom-stt-no-audio-leak

Conversation

@Rahulsharma0810
Copy link
Copy Markdown

Bug

When a custom STT provider (Deepgram, Local Whisper, or a custom endpoint) is configured, CompositeTranscriptionSocket opens two WebSocket connections simultaneously and sends every raw audio chunk to both:

Audio chunk → primarySocket  → custom STT provider   ✅ intended
Audio chunk → secondarySocket → api.omi.me/v4/listen  ❌ unintended

The secondary connection causes the Omi backend to transcribe the audio in parallel, consuming listening minutes from the user's quota — even though the custom provider is handling all transcription.

This makes the "bring your own STT" feature misleading. Per Omi's own support:

"If you're still sending sessions through the cloud /v4/listen pipeline, listening minutes will count even with custom STT."

The custom_stt=enabled query flag tells the backend to use forwarded transcripts instead of its own transcription output, but it does not stop the audio stream from being received and metered.

Root Cause

composite_transcription_socket.dart send() (lines 141-147 before this fix):

void send(dynamic message) {
    primarySocket.send(message);   // audio → custom STT
    secondarySocket.send(message); // audio → /v4/listen — always, unconditionally
}

Fix

Add skipAudioToSecondary flag to CompositeTranscriptionSocket. When true:

  • Raw audio bytes go only to the primary socket (custom STT provider)
  • Secondary socket still connects and receives forwarded transcript JSON via _forwardAsSuggestedTranscript
  • Conversation saving, AI processing, and memory extraction on the Omi backend continue normally
  • Audio transcription — and minute counting — is skipped

Set skipAudioToSecondary: true unconditionally in _createCompositeService since the composite path is only reached when a custom STT config is active.

Behaviour After Fix

Before After
Custom STT transcription ✅ works ✅ works
Conversation saving ✅ works ✅ works
AI processing / memories ✅ works ✅ works
Listening minutes consumed ❌ always ✅ not consumed
Audio sent to api.omi.me ❌ always ✅ never

Changes

2 files, 19 lines added, 1 line changed.

File Change
composite_transcription_socket.dart Add skipAudioToSecondary field + conditional in send()
transcription_service.dart Pass skipAudioToSecondary: true in _createCompositeService

When a custom STT provider (Deepgram, Local Whisper, custom endpoint)
is configured, CompositeTranscriptionSocket opens two WebSocket
connections simultaneously and sends every raw audio chunk to both:

  primarySocket  → custom STT provider (intended)
  secondarySocket → api.omi.me/v4/listen (unintended side-effect)

The secondary connection to /v4/listen causes the Omi backend to
transcribe the audio stream in parallel, consuming listening minutes
from the user's quota — even though the custom provider is handling
all transcription. This makes the "bring your own STT" feature
misleading: minutes are counted regardless of which provider is used.

Fix: add skipAudioToSecondary flag to CompositeTranscriptionSocket.
When true, raw audio bytes go only to the primary socket. The secondary
socket still connects and receives forwarded transcript JSON
(_forwardAsSuggestedTranscript), so conversation saving, AI processing,
and memory extraction on the Omi backend continue to work normally.
Only the audio transcription — and the minute counting — is skipped.

Set skipAudioToSecondary: true unconditionally in _createCompositeService
since the composite is only constructed when a custom STT config is active.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 14, 2026

Greptile Summary

This PR fixes a bug where CompositeTranscriptionSocket.send() unconditionally forwarded every raw audio chunk to the Omi /v4/listen backend even when a custom STT provider was handling transcription, causing listening minutes to be metered unnecessarily. The fix adds a skipAudioToSecondary flag (default false) and sets it to true unconditionally in _createCompositeService, which is only reachable when a custom STT config is active. The secondary socket still connects and receives forwarded transcript JSON via _forwardAsSuggestedTranscript, so conversation saving and memory extraction continue to work normally.

Confidence Score: 5/5

  • Safe to merge — the fix is minimal, targeted, and backward-compatible; all remaining findings are P2 suggestions.
  • No P0 or P1 issues found. The flag defaults to false preserving existing behavior for all non-custom-STT paths, and true is only set in the custom STT composite path where it is always correct. Two P2 notes (flag naming/semantics and potential idle-timeout on the secondary socket during silence) are worth investigating but do not block the fix.
  • No files require special attention.

Important Files Changed

Filename Overview
app/lib/services/sockets/composite_transcription_socket.dart Adds skipAudioToSecondary flag (defaults to false) and guards secondarySocket.send(message) behind it — clean, backward-compatible change that correctly prevents raw audio from reaching the Omi backend when a custom STT provider is active.
app/lib/services/sockets/transcription_service.dart Passes skipAudioToSecondary: true unconditionally in _createCompositeService, which is only reached when a custom STT config is active — the placement is correct and the intent is well-documented.

Sequence Diagram

sequenceDiagram
    participant App as Flutter App
    participant CS as CompositeTranscriptionSocket
    participant PS as primarySocket<br/>(Custom STT)
    participant SS as secondarySocket<br/>(Omi /v4/listen)

    Note over App,SS: Before fix — audio sent to BOTH sockets

    App->>CS: send(audioChunk)
    CS->>PS: send(audioChunk)
    PS-->>CS: transcript JSON
    CS->>SS: _forwardAsSuggestedTranscript(transcript)
    CS->>SS: send(audioChunk) ❌ metered

    Note over App,SS: After fix — skipAudioToSecondary: true

    App->>CS: send(audioChunk)
    CS->>PS: send(audioChunk)
    PS-->>CS: transcript JSON
    CS->>SS: _forwardAsSuggestedTranscript(transcript) ✅
    Note over CS,SS: secondarySocket.send(audioChunk) skipped ✅

    Note over SS: Omi backend processes forwarded<br/>transcripts only — no listening minutes consumed
Loading

Comments Outside Diff (2)

  1. app/lib/services/sockets/composite_transcription_socket.dart, line 147-159 (link)

    P2 Flag semantics broader than the name implies

    skipAudioToSecondary skips all messages routed through send(), not just raw audio bytes. Today only audio chunks flow through this path, so behaviour is correct. However, if a caller ever sends a text/JSON control message via send() (e.g. a keep-alive or metadata frame), it will also be silently dropped for the secondary socket. A slightly safer name would be skipSendToSecondary, or the guard could inspect the message type:

    This way any future non-audio send() call still reaches the secondary socket unintentionally blocked.

  2. app/lib/services/sockets/composite_transcription_socket.dart, line 53-88 (link)

    P2 Secondary socket keep-alive may be lost during extended silences

    Before this fix, continuous audio frames (including silence) kept the secondary WebSocket alive at the application layer. After the fix, the secondary socket receives data only when the primary produces a transcript (i.e. when speech is detected). During long silence periods, no bytes reach the secondary socket at all.

    If the Omi backend enforces an application-level idle timeout on /v4/listen connections that receive no data, it will close the secondary WebSocket. _onSocketClosed then tears down both sockets, silently interrupting the session.

    Worth verifying whether the backend sends WebSocket ping frames (or tolerates long idle windows) when custom_stt=enabled, and if not, consider adding a client-side periodic ping to the secondary socket to keep it alive during silence.

Reviews (1): Last reviewed commit: "fix: stop sending audio to Omi backend w..." | Re-trigger Greptile

…alive

Two Greptile P2 review fixes:

1. Rename flag: skipAudioToSecondary → skipSendToSecondary
   The flag guards the send() path, not just audio specifically. Any
   message passed to send() is skipped for the secondary socket.
   The new name reflects what is actually skipped (the send() call)
   rather than implying audio-type inspection. _forwardAsSuggestedTranscript
   calls secondarySocket.send() directly and is unaffected by this flag.

2. Document keepalive: PureSocket sets pingInterval=20s on the underlying
   IOWebSocketChannel, so WebSocket protocol-level pings fire every 20
   seconds regardless of application data. The secondary socket stays alive
   during silence without any additional keep-alive logic.

Addresses Greptile P2 comments on PR BasedHardware#6634.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Rahulsharma0810
Copy link
Copy Markdown
Author

Addressing Greptile P2 comments — both fixed in commit 6a04ebf

P2-1 — Flag naming
Renamed skipAudioToSecondaryskipSendToSecondary. The flag guards the send() path, not audio bytes specifically. _forwardAsSuggestedTranscript calls secondarySocket.send() directly and is unaffected regardless of flag value — the name now reflects what is actually skipped.

P2-2 — Secondary socket keepalive during silence
PureSocket.connect() passes pingInterval: const Duration(seconds: 20) to IOWebSocketChannel.connect() (line 77 in pure_socket.dart). This fires WebSocket protocol-level ping frames every 20 seconds automatically, independent of application data. The secondary socket will receive a ping every 20 seconds even during extended silence — no additional application-level keepalive needed.

If the Omi backend enforces an application-level audio-data idle timeout (distinct from a network/WebSocket idle timeout), that would be a separate backend-side concern worth raising, but standard WebSocket keepalives should handle the common case.

@Rahulsharma0810
Copy link
Copy Markdown
Author

Summary for human reviewers

This PR fixes a silent billing issue with the custom STT feature.

What was happening

When a user configures a custom transcription provider (Deepgram, Local Whisper, or a custom endpoint) in Settings → Developer Options → Transcription, the app was still streaming raw audio to api.omi.me/v4/listen in parallel. This caused listening minutes to be counted against the user's quota even though Omi's backend was not doing any transcription — the custom provider was handling it entirely.

Omi's own support confirmed this behaviour:

"If you're still sending sessions through the cloud /v4/listen pipeline, listening minutes will count even with custom STT."

What this PR does

Adds a skipSendToSecondary flag to CompositeTranscriptionSocket. When a custom STT provider is active, raw audio bytes go only to that provider. The secondary connection to /v4/listen stays open and still receives forwarded transcript text — so conversation saving, memory extraction, and AI processing on the Omi backend all continue working. The backend just never receives or meters the audio stream.

What is not affected

  • Users using Omi's default transcription: no change, flag defaults to false
  • Conversation saving, memories, AI insights: all continue normally
  • The secondary WebSocket connection itself: still opens, stays alive via 20s protocol-level pings

The Greptile bot reviewed this at 5/5 confidence and marked it safe to merge. The two comments it raised (flag naming and keepalive) were addressed in commit 6a04ebf — see the previous comment for details.

@Rahulsharma0810
Copy link
Copy Markdown
Author

Fixes #6637

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant