The Cost of Running Complex Workloads

With regular deep learning models, like predicting the probability of an event, you would want to serve on the CPU because you don't have a batch versus in training. So there has been a lot of interesting work by the community of folks kind of allowing me to run these models on commodity hardware. There's something called LAMA CPP, I think that someone hacked together where it's a super efficient implementation of inference for LAMA on a commodity CPU.

Play episode from 01:17:24

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app