
19 - Mechanistic Interpretability with Neel Nanda
AXRP - the AI X-risk Research Podcast
Scaling Laws and Deep Learning
The deep minds chinchilla paper the main interesting results was that everyone was taking models that were too big and training them on two little data. They made a 70 billion parameter model that was about as good as Google Brain's Palm which is 600 billion but with notably less compute. Yes I will counsel that I think parameters are somewhat overrated as a way of gauging model capability. The scaling laws work has been fairly net negative and has been used by people just trying to push the frontier capabilities though I don't have great insight into these questions.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.