The a16z Show cover image

Inferact: Building the Infrastructure That Runs Modern AI

The a16z Show

00:00

Model and architecture divergence

Woosuk discusses diverging attention mechanisms, model IO formats, and how vLLM integrates vendor kernels and references.

Play episode from 26:38
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app