MLOps.community  cover image

The Future of Search in the Era of Large Language Models // Saahil Jain // MLOps Podcast #150

MLOps.community

00:00

The Trade-Off Between Latency and Relevance in Generative Models

There is an interesting trade up between latency throughput and relevance. The best try to get a mind trick in in my eyes is what you were talking about when it comes to having somebody fill out a survey or do something while they're waiting because then the perceived latency goes out the window. And I think there's definitely a lot of ways in which you can get a very high relevance so for example you can use simple classifiers like a bird based or distill bird based model. In that case you may get low latency but the relevance may not be as good compared to using say you know an open AI API where you're essentially using GPT 3.5 where you'll probably get better results

Play episode from 25:12
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app