The Inside View cover image

[JUNE 2022] Aran Komatsuzaki on Scaling, GPT-J and Alignment

The Inside View

00:00

How to Build a Wider Language Model

GPTJ was able to be more efficient than GPTNIO which I believe is another model from Illuther AI but using TensorFlow match. And yeah, so did you just like took the same app parameters and architecture from GPT3 Baber or did you like had to change stuff to match the performance? We tried LOSUR with to deep depth ratio. Another thing we tried is placing feed forward layer with attention layer in parallel. Basically, this is just also saves latency and you can also make your accelerators utilize better.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app