Latent Space: The AI Engineer Podcast cover image

[NeurIPS Best Paper] 1000 Layer Networks for Self-Supervised RL — Kevin Wang et al, Princeton

Latent Space: The AI Engineer Podcast

00:00

Scaling axes: depth, width and batch size

Ishaan Javali compares scaling depth vs width and explains why depth was more parameter-efficient in experiments.

Play episode from 06:07
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app