MLOps.community  cover image

CPU vs GPU // Ronen Dar & Gijsbert Janssen van Doorn // MLOps Coffee Sessions #99

MLOps.community

00:00

How to Share Multiple Models on a Single GPU?

There's multiple ways of, you know, running multiple motors on a single g p. One approach is to load multiple models to a single server. The triton really simplifies that that problem. So the rascuis changing nowo, it's not just one model per gip. It's multiple models that can be hoisted on on one g pu. You need to scale up your models instead of scaling up your servers. Make sure that one model doesn't starve others. We also allow them to to have another approach of just deploying models, not on foll gpu, but on fractions of gpu.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app