
Evaluating models without test data (Practical AI #194)
Changelog Master Feed
Is Weight Watcher GPU Optimized?
The current model runs a singular value decomposition on each layer. It could take anywhere from a couple minutes to an hour to beam if you're trying to run it on GPT and you have a thousand layers, it's going to take some time. If you just have a few layers in your model and you're training like a small model, it's very, very fast. You don't need the GPU, you need to open run it. So that's sort of the takeaway.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.