
Evaluating models without test data (Practical AI #194)
Changelog Master Feed
00:00
Is Weight Watcher GPU Optimized?
The current model runs a singular value decomposition on each layer. It could take anywhere from a couple minutes to an hour to beam if you're trying to run it on GPT and you have a thousand layers, it's going to take some time. If you just have a few layers in your model and you're training like a small model, it's very, very fast. You don't need the GPU, you need to open run it. So that's sort of the takeaway.
Transcript
Play full episode