MLOps.community  cover image

LinkedIn Recommender System Predictive ML vs LLMs

MLOps.community

00:00

Mitigate LLM Latency With Distillation Or Offline Use

  • Use lightweight LLMs or distill large models into student models to avoid inference latency in feeds.
  • Alternatively run LLMs offline to generate features and use fast traditional models for online ranking.
Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app