
161: Leveraging Generative AI Models with Hagay Lupesko
Programming Throwdown
00:00
The Cost of Running Complex Workloads
With regular deep learning models, like predicting the probability of an event, you would want to serve on the CPU because you don't have a batch versus in training. So there has been a lot of interesting work by the community of folks kind of allowing me to run these models on commodity hardware. There's something called LAMA CPP, I think that someone hacked together where it's a super efficient implementation of inference for LAMA on a commodity CPU.
Transcript
Play full episode