The InfoQ Podcast cover image

Meryem Arik on LLM Deployment, State-of-the-art RAG Apps, and Inference Architecture Stack

The InfoQ Podcast

00:00

Simplified and Secure Self-Hosting of AI Apps with Unique Architecture Stack

The chapter delves into the company's architecture stack comprising Titan, Takeoff, and the inference stack, which simplifies and secures self-hosting of AI apps. By utilizing a Rust server, Python, and Triton inference engine, the stack automates model deployment onto GPUs, offering a scalable and efficient solution for developers.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app