Advanced - Using custom voice actions

📘

The Twilio integration is currently in beta. If you are participating in the beta program, we'd love to hear your feedback! Please email your thoughts and experiences to [email protected].
If you're interested in joining the beta to try out this exciting new capability, please visit our signup form here to request access. We'll notify you as soon as a spot becomes available.

During the beta period, please keep in mind that:

  • Some features may still be under active development
  • Documentation may be incomplete or subject to change

We greatly appreciate your willingness to be an early adopter and help shape the future of voice AI!

Overview

Custom actions allow you to extend the functionality of your voice agents and implement advanced telephony workflows. By leveraging custom actions, you can:

  • Create interactive voice response (IVR) menus with DTMF (touch-tone) input
  • Capture and validate caller input, such as account numbers or PINs
  • Transfer calls to external phone numbers based on caller responses
  • Dynamically update the agent's ASR settings during the conversation

This guide will show you how to set up and use custom actions to build sophisticated voice experiences.

Getting Started

To add a custom action to your canvas:

  1. Drag the "Custom action" step from the Dev section in the steps toolbar.
  2. Enter a name for your action in the block's settings panel
  3. Select "Stop on action" if you want the action to interrupt the agent's speech
  4. Define the expected inputs and outputs for your action in the "Body" section using JSON notation

Key Concepts

  • DTMF (Dual-Tone Multi-Frequency): The touch tones generated when a caller presses keys on their phone keypad. Each number key corresponds to a unique frequency pair.
  • DTMF Input: Collecting caller input by having them press number keys on their phone in response to your agent's prompts. Useful for navigating menus, entering data, or making selections.
  • Call Transfer: Programmatically forwarding an in-progress call to a different phone number, such as to route the caller to a live agent or another department.
  • ASR Settings: The configuration options that control the agent's automatic speech recognition (ASR) behavior, such as the language model, recognition timeout, and confidence threshold.

How To

Retrieve Call Metadata

As part of the initial launch request (at the start step), we will include:

  • userNumber - phone number of the user
  • agentNumber - phone number of the agent (you may have multiple numbers attached to the same agent)
  • callType - if this is an inbound or outbound call
{
  type: "launch",
  payload: {
    callType: "inbound" | "outbound",
    userNumber: string,
    agentNumber: string
  }
}

You can access this information through the last_event variable in a Javascript step. For example:

Set Up a DTMF Menu

  1. Add a Message step to your flow and have your agent prompt the caller to select from a numbered list of options ("Press 1 for Sales, 2 for Support...).
  2. Drag a Custom action step onto the canvas and name it "DTMF".
  3. Enable the "Stop on action" option.
  4. Under Paths you can add DTMF 1, DTMF 2, etc. The default path will be followed if the user presses or says something that is not setup as its own discrete path in the step configuration.
  5. Add connections from the custom action step to other steps representing each menu option.
  6. In each connected block, check for the DTMF input using the last_event.data variable (e.g., last_event.data == "1" for the first option).
  7. Route the caller to the appropriate flow based on their DTMF selection.

Capture a PIN Code

Following the same steps above to capture the keys a user presses, we will now set up a design to use this method for capturing user input.

  1. Drag a Custom action step onto canvas and name it "DTMF".
  2. Enable the "Stop on action " option.
  3. Setup default path only as this will allow for any keys during capture to be registered as part of the loop.
  4. Use a Javascript step to validate the captured digit and to continue once input length has been met.
    1. We can make use of the last_event built-in variable inside the Javascript step
    2. The name of a DTMF event is always called DTMF {DIGIT}, so we can just look for events that start with DTMF and pluck out the digit at the end. When you turn this into a loop, you are now collecting numbers from the user (see image below for example).
  5. Design appropriate flows for valid and invalid PIN scenarios.

Transfer Call to Agent

  1. At the point in your flow where you want to transfer the call, add a Custom action step.
  2. Name the block "forward-call".
  3. In the Body field, enter:
    {
      "number": "+1415XXXXXXX"
    }
    
  4. Add a Message step before the Transfer explaining to the caller that they will be transferred.

Update ASR Settings Mid-Call

  1. Identify points in your flow where you expect the caller's speech patterns to change (e.g., switching languages, long vs. short inputs).
  2. Drag a Custom action step onto canvas and name it "asr".
  3. Enable the "Stop on action " option.
  4. In the Body field, specify the ASR parameters you want to modify. Here are the available settings:
    {
      settings: {
        locale: string,
        punctuationWaitMS: number,
        partialWaitMS: number,
        utteranceEndMS: number,
        silenceWaitMS: number,
      }
    }
    
    1. Example usage
      {
        settings: {
          locale: "fr",
          punctuationWaitMS: 5000
        }
      }
      
  5. Thoroughly test your flow and ASR performance after each settings change.

Best Practices & Tips

  • Use DTMF menus judiciously to avoid overloading callers with too many options. Limit menus to 3-5 clearly differentiated choices.
  • When capturing data like PINs, use a Message step to confirm the caller's input before proceeding.
  • Be mindful of the caller's experience when transferring calls - provide estimated wait times or option to continue talking to AI agent.
  • Test your ASR settings changes with a diverse set of voices and accents to ensure they improve recognition accuracy for your target audience.

Troubleshooting

Caller Hears Silence After Entering PIN

  • Check that your PIN validation code is executing quickly and not hanging the conversation
  • Use a Message step to confirm the caller's PIN was received before processing it
  • Investigate transcript logs and test your PIN flow to debug any issues.