2min chapter

Machine Learning Street Talk (MLST) cover image

OpenAI GPT-3: Language Models are Few-Shot Learners

Machine Learning Street Talk (MLST)

CHAPTER

Is It Better to Train an Auto Aggressive Model?

I suppose my intuition is that a bidirectional model would learn more about the structure of the language, but it's easier to train an auto aggressive model. Is that fair? Maybe at like scale, it doesn't matter anymore. Once you scale up bidirectional, you know, unidirectional, it's all okay now that we've trained on tons of data. With with there is this horrible quadratic layer wise time complexity, so they had to truncate the input size to 512. And the reason it can take 2,000 tokens as input is just because Microsoft has built like this big of a computer. That's the reason.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode