Krisp Noise Cancellation
Krisp Noise Cancellation
Latency is one piece of the puzzle — but quality matters too. That’s why we’ve added Krisp.
Background noise, especially speech or music, can seriously throw off voice agents. ASR systems transcribe everything they hear, so voices in a coffee shop or lyrics from background music can easily get mistaken for the user’s input, leading to weird or incorrect responses. It can also confuse the agent into thinking the user isn’t done talking, delaying responses or interrupting playback. In short: noise kills both quality and speed.
All voice projects (web-voice widget and Twilio) automatically have Krisp noise cancellation applied.
Before Krisp:
After Krisp:
Here are two spectrograms, the upper one visualizing the audio that would be heard by ASR without Krisp, and the lower one showing the audio after having been processed with Krisp.

Through our testing:
We've determined that this significantly boosts the accuracy of speech detection and transcription in noisy environments: cafes, offices, on the street, background broadcasts, etc.
Krisp noise cancellation adds ~20ms of latency to the audio pipeline, while drastically improving speech detection and transcription accuracy. This ultimately leads to faster final transcriptions, reducing overall speech-to-speech latency by ~100ms.