Ethan is known on Twitter as the edgiest person at MILA. We discuss all the gossips around scaling large language models in what will be later known as the Edward Snowden moment of Deep Learning. On his free time, Ethan is a Master’s degree student at MILA in Montreal, and has published papers on out of distribution generalization and robustness generalization, accepted both as oral presentations and spotlight presentations at ICML and NeurIPS. Ethan has recently been thinking about scaling laws, both as an organizer and speaker for the 1st Neural Scaling Laws Workshop.
Transcript: https://theinsideview.github.io/ethan
Youtube: https://youtu.be/UPlv-lFWITI
Michaël: https://twitter.com/MichaelTrazzi
Ethan: https://twitter.com/ethancaballero
Outline
(00:00) highlights
(00:50) who is Ethan, scaling laws T-shirts
(02:30) scaling, upstream, downstream, alignment and AGI
(05:58) AI timelines, AlphaCode, Math scaling, PaLM
(07:56) Chinchilla scaling laws
(11:22) limits of scaling, Copilot, generative coding, code data
(15:50) Youtube scaling laws, constrative type thing
(20:55) AGI race, funding, supercomputers
(24:00) Scaling at Google
(25:10) gossips, private research, GPT-4
(27:40) why Ethan was did not update on PaLM, hardware bottleneck
(29:56) the fastest path, the best funding model for supercomputers
(31:14) EA, OpenAI, Anthropics, publishing research, GPT-4
(33:45) a zillion language model startups from ex-Googlers
(38:07) Ethan's journey in scaling, early days
(40:08) making progress on an academic budget, scaling laws research
(41:22) all alignment is inverse scaling problems
(45:16) predicting scaling laws, useful ai alignment research
(47:16) nitpicks aobut Ajeya Cotra's report, compute trends
(50:45) optimism, conclusion on alignment