Overview

VAPI is a great way to deploy your Voiceflow agents to production in an audio environment. It handles everything from Automatic Speech Recognition (ASR), to AI powered Text to Speech (TTS) with providers like ElevenLabs, and even hosting your agent on a live phone number, all with very low latency.

To connect your Voiceflow agent to VAPI, you'll have to host a proxy, using VAPI's custom LLM features. It receives VAPI's text completion requests and formats them to be sent to the Voiceflow interact API, and then processes Voiceflow's response to be fed back to VAPI to be spoken. All you'll need to do is host this proxy yourself on a server for the messages to be routed between Voiceflow and VAPI.

Instructions and Code

We've already built an example integration for you to use, with detailed setup instructions. You can find the GitHub repository with all the code and instructions to set up the integration yourself here. Here's also a video going over this integration.

This method works both with text and voice agents. In a voice agent use the speak step to get the output spoken. In a text agent, use the basic text step to get the output spoken. Any other types of output from your Voiceflow agent (like audio step, buttons, images, etc...) won't be spoken by your agent on VAPI.

To learn more about building voice agents with VAPI, join their community on Discord.

Workshop

You can also see a longer workshop we did with the VAPI community to learn a bit more about building voice agents on Voiceflow deployed through VAPI.