Advanced Knowledge Base

This article is a logical continuation of the previous Knowledge Base guide and goes over more advanced uses of the knowledge base.

Advanced Use of the Knowledge Base

Document Upload API

Instead of only interacting with the Knowledge Base from inside your agent, you can use the KB Query API to query the knowledge base, and can change more settings like the number of chunks returned or disabling synthesis, allowing you to do your own processing on the chunks retrieved. Learn more about the API, that can be called either using the API step in VF or called outside Voiceflow.

For users who require more control and flexibility, Voiceflow offers advanced formats for organizing and querying Knowledge Base content. We'll explore these features using a credit card application process example.

Using a tabular data format

The tabular data format is useful for structured information and allows for more precise querying and filtering of data. This format is particularly beneficial for organizing complex information like credit card application processes.

Key Features:

  • Structured data representation in rows and columns
  • Ability to add detailed metadata and tags to each entry
  • Enhanced control over searchable fields
  • Improved precision in data retrieval
  • Example Structure: Here's how you might structure credit card application information using the tabular data format:
{  
  "data": {  
    "name": "credit_card_applications",  
    "schema": {  
      "searchableFields": ["cardType", "applicationSteps", "requirements", "faqs"],  
      "metadataFields": ["userType", "lastUpdated", "pageLink"]  
    },  
    "items": [  
      {  
        "cardType": "Personal Rewards Card",  
        "applicationSteps": "1. Fill out personal information. 2. Provide income details. 3. Submit application.",  
        "requirements": "Must be 18 years or older, have a valid SSN, and annual income of at least $30,000",  
        "faqs": "What credit score do I need? How long does the application process take? Is there an annual fee?",  
        "userType": "personal",  
        "lastUpdated": "2024-07-01",  
        "pageLink": "/personal-rewards-card"  
      },  
      {  
        "cardType": "Business Cash Back Card",  
        "applicationSteps": "1. Enter business details. 2. Provide business financial information. 3. Submit application with required documents.",  
        "requirements": "Must have a registered business, EIN, and annual business revenue of at least $50,000",  
        "faqs": "Can I apply as a sole proprietor? What documents do I need? How is the cash back calculated?",  
        "userType": "business",  
        "lastUpdated": "2024-07-15",  
        "pageLink": "/business-cash-back-card"  
      }  
    ]  
  }  
}

Using the above data format, you’re able to have much more granular control over what information you retrieve. For example, the FAQs section adds additional vectors that would enhance the Knowledge Base’s ability to find similar information that is similar to what the user has asked.

To view the relevant endpoints, refer to the API reference and Upload Table Data endpoint.

KB Query API

Instead of only interacting with the Knowledge Base from inside your agent, you can use the KB Query API to query the knowledge base, and can change more settings like the number of chunks returned or disabling synthesis, allowing you to do your own processing on the chunks retrieved. Learn more about the API, that can be called either using the API step in VF or called outside Voiceflow.

Filter with metadata

You can refine your Knowledge Base search queries using metadata. Voiceflow enables you to associate key-value pairs as metadata with documents and define filter expressions for your queries.

When you use metadata filters, the searches precisely retrieve the number of results that match the specified criteria. Typically, these filtered searches have even lower latency compared to searches without filters.

Metadata Types and Structure

Supported Types

  • String: For textual data.
  • Number: For numeric values.
  • Boolean: True or false values.
  • Arrays: Arrays containing any other supported type.
  • Objects: Nested JSON objects for hierarchical data structures.

Metadata Size Limitations

The system supports up to 10kb of metadata per chunk, allowing for detailed and extensive metadata without compromising performance.

Example Metadata Payloads

{
    "developer": {
        "name": "Jane Doe",
        "skills": ["Python", "JavaScript"],
      	"tags": ["t1", "t2", "t3", "t4"],
         "languages": [
              {
                  "name": "Russian"
              },
              {
                  "name": "German"
              }
          ]
    },
    "project": {
        "name": "AI Development",
        "deadline": "2024-12-31",
      	"price": 100
    }
}

Metadata Query Language

📘

Voiceflow's filtering query language is based on MongoDB’s query and projection operators.

Voiceflow's query language for chunkDB is inspired by MongoDB, designed specifically for conversational metadata and supports a variety of operators for both straightforward and complex queries.

Supported Operators

  • Equality and Comparison
    • $eq: Equal to
    • $ne: Not equal to
    • $gt: Greater than
    • $gte: Greater than or equal to
    • $lt: Less than
    • $lte: Less than or equal to
  • Array Operations
    • $in: Matches any of the values specified in an array
    • $nin: Does not match any of the values specified in an array
    • $all: Matches all values specified in an array
  • Logical Operators
    • $and: Logical AND that combines multiple conditions
    • $or: Logical OR that combines multiple conditions

Querying Nested Objects

Use dot notation to specify the path to nested fields, enabling precise queries on hierarchical data.

More examples

Case 1: Match All Specific Tags

Objective: Identify chunks that include every specified tag in the list.

{
    "filters": {
        "developer.tags": {
            "$all": ["t1", "t2"]
        }
    }
}

Case 2: Match Any of the Specified Tags

Objective: Find chunks containing any of the tags listed.

{
    "filters": {
        "developer.tags": {
            "$in": ["t1", "t2"]
        }
    }
}

Case 3: Exclude Chunks With Certain Tags

Objective: Filter out chunks that include any of the tags specified.

{
    "filters": {
        "developer.tags": {
            "$nin": ["t1", "t2"]
        }
    }
}

Case 4: Exact Match on Text Field

Objective: Retrieve chunks where the text field exactly matches a specified value.

{
    "filters": {
        "developer.name": {
            "$eq": "Jane Doe"
        }
    }
}

Case 5: Combination of Conditions

Objective: Search for chunks that either contain a specific tag or match a text value exactly.

{
    "filters": {
        "$or": [
            {
                "developer.tags": {
                    "$in": ["t1"]
                }
            },
            {
                "developer.name": {
                    "$eq": "Jane Doe"
                }
            }
        ]
    }
}

Case 6: Numeric Range and Specific Element in Array

Objective: Locate chunks priced within a certain range and containing a specific language.

{
    "filters": {
        "project.price": {
            "$gte": 5,
            "$lt": 100
        },
        "developer.languages[].name": {
            "$eq": "Russian"
        }
    }
}

Case 7: Querying Nested Object Attributes

Objective: Find chunks where the developer has specific attributes and the project meets certain deadlines.

{
    "filters": {
        "$and": [
            {
                "developer.name": {
                    "$eq": "Jane Doe"
                }
            },
            {
                "developer.skills": {
                    "$in": ["JavaScript"]
                }
            },
            {
                "project.deadline": {
                    "$eq": "2024-12-31"
                }
            }
        ]
    }
}

Inserting metadata to data sources

All supported data sources in the Knowledge Base support adding metadata which is then propagated to the relevant chunks that are extracted from the uploaded data. Depending on the type, follow the respective method below:

1. FILE Upload

For uploading .txt, .docx or .pdf:

curl --request POST \\
  --url '<https://api.voiceflow.com/v1/knowledge-base/docs/upload?overwrite=true>' 
  --header 'Authorization: YOUR_DM_API_KEY' 
  --header 'Content-Type: multipart/form-data' 
  --form 'file=@/path/to/your/file.pdf' 
  --form 'metadata={"inner": {"text": "some test value", "price": 5, "tags": ["t1", "t2", "t3", "t4"]}}'

2. URL Upload

To upload a URL:

curl --request POST \\
  --url '<https://api.voiceflow.com/v1/knowledge-base/docs/upload?maxChunkSize=1000&overwrite=true>'
  --header 'Authorization: YOUR_DM_API_KEY' 
  --header 'Content-Type: application/json' 
  --data '{
    "data": {
        "type": "url",
        "url": "<https://example.com/>",
        "metadata": {"test": 5}
    }
}'

3. TABLE Upload

To upload table data with metadata, structure your data as follows:

curl --request POST \
  --url 'https://api.voiceflow.com/v1/knowledge-base/docs/upload/table' \
  --header 'Authorization: YOUR_DM_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "data": {
        "name": "products",
        "schema": {
            "searchableFields": ["name", "description"],
            "metadataFields": ["developer", "project"]
        },
        "items": [
            {
                "name": "example_name",
                "description": "example_description",
                "developer": {
                    "name": "Jane Doe",
                    "level": "senior",
                    "skills": ["Python", "JavaScript"],
                    "languages": [
                        {
                            "name": "Russian"
                        },
                        {
                            "name": "German"
                        }
                    ]
                },
                "project": {
                    "name": "AI Development",
                    "deadline": "2024-12-31"
                }
            }
        ]
    }
}'

By following these methods, you can ensure your data sources are enriched with metadata, enhancing the capability of your Voiceflow Knowledge Base to provide precise and contextually relevant search results.