We want to convince or so let us know do you are way comparable with other implementation we want to compare to more models but many models we don't know the detail. So in that case for the benchmark run we try to run the model longer until the model converge which means like the validation loss is not going down again. We take that value to compare with what people has been reporting in their paper and it's a fair comparison. The bottom line is if you use the byte level or byte level encoding byte level prediction then the megabyte architecture is blowing the other side of the water. If you're kind of comparing on something like opt you know each each architecture working in its optimal condition then

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode