Interruption Behavior
How to manage interruptions to the bot during the call
A Voiceflow agent operates on a turn by turn basis, and the user is always at a particular "state" at any point in time, a specific step on a workflow/component.
During conversations, the user can cut off the agent at any given point, and it can be tricky to manage this.
Voice Interruption
Interruption Threshold
The first setting is Interruption Threshold which is the number of words before the agent will stop talking. The Automatic Speech Recognition (ASR) is always running and will have "partial" transcriptions.
The agent will still be running in the background, but will no longer talk over the user.
Full Interruption
A full interruption is when an full utterance is resolved by the ASR, and the next turn begins before the previous one ends. A utterance is determined by the other Telephony settings such as Silence Wait, Utterance End, Punctuation Wait, etc.
Interruption State
Reference the sample agent below. After the user says something, there are a series of simple message steps broken up by long running API / Prompt steps.
Message steps are always nearly instanteous, but other steps (API, Functions, Prompts) are blocking and take some time before we can proceed.
When an interruption happens, as the previous turn hasn't made it to "next state", we will always start at the capture step under "starting point".
The previous turn will also stop executing. For example if the interruption happened during "GET - long API call", it will no longer call "long LLM prompt" and use up tokens in the background.
If the previous turn has made it to "next state", we are no longer interrupting, but rather just starting the next turn normally.
Audio does not represent current state
What the agent is speaking to the user has a slight delay and the actual state of the conversation is likely ahead of the speech (speech is buffered).
For example when we hit "message 1":
- "GET - long API call" is next and starts running (background task).
- generate the Text-to-Speech audio
- play the audio, this takes even longer
In the current example, the API step has already started the call by the time the first message is speaking.
By the time "message 3 / last message" is speaking, the turn is done and the user already is on "next state".
This is done for maximum latency gains during voice conversations - and minimize awkward silences, but can be hard to debug.
Updated 10 days ago