How to Scale a Multi Task Training Procedure

GPT and T0 are some of the most like compute efficient models out there. And my project is basically trying to combine all these with scaling. So, yeah, GPT-3 doesn't perform as well as power but given the amount of compute it's spent, it consumes very well on many different tasks. Without using having to, I think it performs very well, even without using few short samples.

Play episode from 28:47

Transcript

Episode notes

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app