Importing data

Enhance agent capabilities by importing data into its knowledge base for tailored and relevant responses.

Supercharge your Voiceflow agent by importing content from websites, documents, and support articles- giving it the context it needs to respond with real, reliable information.

Supported data sources

Voiceflow supports five different data sources:

SourceDescription
URL(s)Paste one or more URLs to import the content from those pages. Use sitemap.xml to bulk import entire websites. You can only import publicly accessible URLs.
SitemapImport all pages from a website into your knowledge base, ideal for full-site crawls. For most websites, the sitemap is found at https://website.com/sitemap.xml.
Upload fileUpload .pdf, .txt, or .docx files (Max 10 mb). Only text can be imported into the knowledge base.
Plain textPaste raw content directly. Great for fast prototyping or testing.
ZendeskImport data from your Zendesk knowledge base. Requires Zendesk admin access.

On the pro plan, each project can support up to 3,000 different sources (URLs, documents, etc) of knowledge base content.

ℹ️

Heads up!

Make sure you don't import confidential or proprietary data unless your use case allows it. Any data that you import may be included in LLM-generated responses.

Refresh rate settings

When importing data from URLs or sitemaps, you can schedule a refresh rate to automatically re-sync your content, keeping your knowledge base up to date.

OptionBest for...
NeverStatic content that doesn't need refreshing
DailyFrequently updated content (e.g., blogs, news sites)
WeeklyOccasionally updated info (e.g., support centers)
MonthlyStable content (e.g., product policies, pricing)
⚠️

Be careful with refresh rates!

When an LLM chunking strategy is enabled, every re-sync will consume credits. If your content doesn't change often, we'd recommend you reduce your refresh rate frequency. When LLM chunking strategies are disabled, re-syncs don't consume any credits.

LLM chunking strategies

You can use LLM chunking strategies to optimize the data in your agent's knowledge base. The chunking strategies use AI to process your data in various ways, optimizing it for agent. The quality of your content directly impacts your agent's ability to answer user questions.

Voiceflow supports five different chunking strategies:

Strategy name

Ideal use case

Smart chunking

Breaks and clusters content into topic-based sections.

Example use cases:

  • Long policy documents (Legal, HR)
  • Real estate listings by region
  • Educational course catalogs

FAQ optimization

Converts structured data into question-and-answer pairs to help the LLM handle direct user queries.

Example use cases:

  • Product or service information
  • Help center FAQ

Remove HTML and noise

Strips out excess formatting, inline code, and markdown artifacts that may confuse the LLM.

Example use cases:

  • Blog posts with embed-heavy formatting
  • Markdown-heavy documents
  • Notion/CMS site exports

Add topic headers

Prepends each chunk with a short summary line and header to signpost the LLM on available information.

Example use cases:

  • Company policy manuals
  • Research whitepapers
  • Developer onboarding guides

Summarize

Automatically extracts the most important ideas or paragraphs from large, verbose documents.

Example use cases:

  • Investor decks
  • Legal agreements
  • Long reports or strategy briefs

We recommend experimenting with various combinations of chunking strategies to see which best fits your use-case.

Troubleshooting data imports

If something goes wrong when importing your data, hover over the❗ icon to learn why.

Import errors are handled gracefully. Failed files won’t break your project - the remaining files will be processed.

Advanced: importing data using Voiceflow's API

Voiceflow's Knowledge Base API allows you to manage your Knowledge base programmatically. Common API use cases include:

  • Importing content from Notion, Google Docs, or Airtable via an SDK or API request
  • Managing KB entries dynamically across multiple workspaces, outside of the Voiceflow Creator
  • Debugging & testing chunk processing behind the scenes