AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Scaling Laws and Deep Learning
The deep minds chinchilla paper the main interesting results was that everyone was taking models that were too big and training them on two little data. They made a 70 billion parameter model that was about as good as Google Brain's Palm which is 600 billion but with notably less compute. Yes I will counsel that I think parameters are somewhat overrated as a way of gauging model capability. The scaling laws work has been fairly net negative and has been used by people just trying to push the frontier capabilities though I don't have great insight into these questions.