It can take quite long for an LLM to write a complete paragraph — even for the fastest models.
With the Response AI / Prompt step, by default the API will wait for the entire response to be generated before sending the text back. This can often take a few seconds and mean users have to wait a long time and then suddenly get a very long message.
By setting the ?completion_events=true query parameter in the Interact Stream API, Voiceflow will return output from the Response AI / Prompt steps as a text stream as it’s generated, which can be shown to the user on an interface capable of handling partial responses.
📘 Only the Response AI / Prompt step produces completion events
Example Response completion_events=false
Response
Example Response completion_events=true
Response
completion_events turned on, it still takes the same total time to get the entire message, but the user will be able to see the first chunk of text within milliseconds: Welcome to our servi…”
Enabling completion_events means the API will return a completion trace instead of a text (or speak) trace. There is a payload.state property which is one of three values:
state: "start"to signal the start of a completion stream.state: "content"to stream in additional text to the same text block, under thecontentpropertystate: "end"to signal that the completion is now finished, and the final LLM token usage
content data may not always be in complete sentences or words.
It is the responsibility of the API caller to stitch the data together and have it reflect live on the conversation interface.
Examples
See ourstreaming-wizard demo (NodeJS). Note the use of the "end" state as a marker to start a new line in the conversation.
Deterministic and Streamed Messages
It may be jarring to pair this with existing deterministic messages that come out fully completed. Some messages stream in, while others are sent as whole. To mitigate this, you can either:- create a fake streaming effect for deterministic messages that matches what messages streamed through completion events look like
- accumulate enough completion traces to form a complete sentence, and send group streamed responses into sentences before displaying them. Look for delimiters such as
.?!;\n(newline). You can then send the completion as a group of smaller complete messages.