PII redaction for AI Agents with Microsoft Presidio

Intro

In this video, we will review and explain how to use Presidio to redact PII within your Voiceflow AI Agent.

What is PII

PII stands for Personal Identifiable Information. Examples include, name, phone number, address, email and many other pieces of information. PII redaction is a way to limit which data is stored in a system providing an additional layer of security. When many pieces of Personal information are obtained, they become PII, or Personally Identifiable information. PII can be used to uniquely identify a person, for example combining address, first and last name.

Redacting PI and PII

There are different options for redaction. In the cookbook we show full redaction and partial redaction.

Bob Joe -> {First Name} {Last Name} is one way to anonymize by fully redacting the properties but leaving their type intact.

123-456-7890 -> ***-***-7890 is another way by keeping some information for verification purposes.

From a API perspective it can look like this:

//request
{
	"text":"John Smith phone number is +33123456789"
}
//The response will contain the anonymized text:

  {
	"text": "<PERSON> phone number is <PHONE_NUMBER>",
	"items": [
		{
			"start": 25,
			"end": 39,
			"entity_type": "PHONE_NUMBER",
			"text": "<PHONE_NUMBER>",
			"operator": "replace"
		},
		{
			"start": 0,
			"end": 8,
			"entity_type": "PERSON",
			"text": "<PERSON>",
			"operator": "replace"
		}
	]
}

Guide

See below for the full tutorial.

Github link

Follow along with the content on our github page