The paper is titled scaling loss for autoregressive generative modeling. It's quite readable despite you know a little bit of technical I think it's the empirical results are pretty easy to get. There's more interesting stuff to pursue not only in this line of empirical work but also seeing if we can extract some theoretical understanding from it that I'd be excited about.