Optimizing API Requests with Envoy

This chapter explores the management of API requests for large language models, focusing on the use of Envoy proxy to enhance network traffic routing and service separation. It discusses the challenges and solutions related to configuring Envoy AI Gateway within Kubernetes, highlighting the importance of community collaboration in advancing AI infrastructure.

Play episode from 32:58

chevron_right

Transcript

chevron_right

Transcript

Episode notes

GenAI Traffic: Why API Infrastructure Must Evolve... Again // MLOps Podcast #296 with Erica Hughberg, Community Advocate at Tetrate.

Join the Community: https://go.mlops.community/YTJoinIn

Get the newsletter: https://go.mlops.community/YTNewsletter

// Abstract

The way we handle API traffic is broken for GenAI. We've spent years optimizing for microservices—fast, stateless, and lightweight API calls. But GenAI changes everything. Requests are slower, heavier, and more complex, requiring long-lived connections, massive payloads, and streaming responses. Suddenly, traditional API gateways are struggling—timeout limits are too short, rate limiting models don’t fit, and payload constraints are blocking innovation.

In this episode, we unpack the new challenges of GenAI traffic and why infrastructure must evolve—again. We look back at previous API shifts, from the C10K problem to the monolith-to-microservices revolution, and how they reshaped networking. Now, AI-driven workloads demand a new kind of API gateway—one that handles token-based rate limiting, cost-aware request shaping, and scalable AI inference traffic.

// Bio

Erica Hughberg is a technical leader and community advocate passionate about helping engineering teams build scalable, secure, and human-centric application platforms. With a background in software engineering and a deep understanding of cloud-native technologies, she specializes in driving the adoption of open-source projects like Envoy Gateway, Istio, and Kubernetes Gateway API, which enable organizations to simplify traffic management, security, and API distribution.

As a maintainer of Envoy AI Gateway, she plays a key role in shaping the future of API infrastructure. She focuses on features to ensure organizations can securely and efficiently integrate AI-powered services while simplifying traffic management, security, and API distribution. In the Envoy community, she drives collaboration, mentorship, and contributions that advance the project and its adoption.

Lastly, as a believer in the power of storytelling, Erica enjoys translating complex technical concepts into engaging, accessible narratives in the form of social media posts, conference talks, podcasts, and educational content.

// Related Links

Efficient Deployment of Models at the Edge // Krishna Sridhar // MLOps Podcast #284 - https://youtu.be/sFqm7GTeulg

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

MLOps Swag/Merch: [https://shop.mlops.community/]

Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Erica on LinkedIn: /ericahughberg

Timestamps:

[00:00] Erica's preferred coffee

[00:30] Takeaways