fix(voice): clean up turn variables when committing a user turn manually#1438
Closed
toubatbrian wants to merge 1 commit intomainfrom
Closed
fix(voice): clean up turn variables when committing a user turn manually#1438toubatbrian wants to merge 1 commit intomainfrom
toubatbrian wants to merge 1 commit intomainfrom
Conversation
Port of livekit/agents#5671 to agents-js. Resets last final-transcript time, speech-start time, and last-speaking time alongside the existing transcript/confidence/committed reset, and ends any in-progress user_turn span (clearing collected STT request ids) so the next speech starts a fresh span and metrics window.
Contributor
Author
|
/cc @toubatbrian @livekit/agent-devs for review. Generated by Claude Code |
🦋 Changeset detectedLatest commit: e4aa7ce The changes in this PR will be included in the next version bump. This PR includes changesets to release 31 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Automated port of livekit/agents#5671 into agents-js.
When
clearUserTurn()is invoked (the path used by manual turn detection to abandon an in-progress turn), it previously left several turn-scoped fields stale:lastFinalTranscriptTime,lastSpeakingTime,speechStartTimewould carry over into the next turn, corrupting the EOUtranscriptionDelay/endOfUtteranceDelaymetrics and the preemptive-generationstartedSpeakingAtfor the next turn.user_turnspan was kept open and would be reused for the next user turn instead of starting a fresh one. Any provider STT request ids collected so far would also be flushed onto the wrong span.This fix resets those turn variables and ends the in-progress
user_turnspan so the next speech starts a fresh span and metrics window.cc @toubatbrian @livekit/agent-devs
Ported features
agents/src/voice/audio_recognition.ts— extendsclearUserTurn()to:lastFinalTranscriptTimeto0,speechStartTimeandlastSpeakingTimetoundefined(mirrors_last_final_transcript_time = None,_speech_start_time = None,_last_speaking_time = Nonein Python)userTurnSpan(if recording) and clear the cached span + accumulatedsttRequestIds, so the next speech starts a fresh span and the next ended span doesn't inherit stale provider request idsImplementation nuances vs. Python
_vad_speech_startedfield has no JS equivalent. In Python,_vad_speech_startedis a separate boolean used to gate_speech_start_timeassignment in the VADSTART_OF_SPEECHhandler. The Python diff resets both fields together (self._speech_start_time = None,self._vad_speech_started = False).In agents-js, that gate is implemented directly as
if (this.speechStartTime === undefined)in the VADINFERENCE_DONEhandler — there is no separatevadSpeechStartedfield. ResettingspeechStartTime = undefinedtherefore re-arms the same gate, so the Python_vad_speech_started = Falseline maps to a no-op in JS. No new field was introduced.lastFinalTranscriptTimesentinel value. Python usesfloat | None; agents-js types this asnumberand uses0as the "not set" sentinel (already established inbounceEOUTask'slastFinalTranscriptTime !== 0check and in the existingendUserTurnreset path that assignsthis.lastFinalTranscriptTime = 0). The port resets to0for consistency rather than introducingundefined.update_stt(None) / update_stt(stt)reset. The existingrestartStt()IIFE already mirrors the Pythonupdate_sttreset (it stops the STT tasks, closes the pipeline, then re-starts the pipeline), so no change was needed there. It's preserved as-is and runs after the new resets.Test plan
pnpm build:agentssucceeds.pnpm exec eslint agents/src/voice/audio_recognition.tsclean.pnpm exec prettier --checkclean for the touched file and changeset.pnpm exec vitest run agents/src/voice/audio_recognition_span.test.ts(4 passed).clearUserTurnmid-turn, confirm the next turn produces a freshuser_turnspan and correct EOU metrics.https://claude.ai/code/session_01P6kkyq5V2krvxdChbADKzg
Generated by Claude Code