AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Problems With Language Models
I think the way I think about it is if we were doing this token by token thing for images, it just wouldn't make sense, right? Like certainly wouldn't produce the images that we see coming out of stable diffusion. What it's going to learn is given the image that I've seen so far, let me predict the next pixel or the next piece, right? That somehow feels like a fundamentally different task than being able to generate an image fully. And in some sense, there are some analogies to RLHF and using PPO for training, for example,. These are all losses designed on not just a token by token basis, but something that's longer. So I feel