LLMLingua2 Prompt Compression

Intro

Learn how to leverage Microsoft's LLMLingua2 for efficient prompt compression, enhancing your Voiceflow agent's performance, tokens usage and reducing latency as we also explore integrating latest OpenAI's GPT-4o model with a fallback to GPT-4 Turbo using Cloudflare Al Gateway.

LLMLingua2 API code example is available on our main repo:
https://github.com/voiceflow/demos-n-examples

Cloudflare AI Gateway API documentation:
https://developers.cloudflare.com/ai-gateway/providers/universal

Intro

Video