2min chapter

OpenAI GPT-3: Language Models are Few-Shot Learners

Machine Learning Street Talk (MLST)

CHAPTER

Is It Better to Train an Auto Aggressive Model?

I suppose my intuition is that a bidirectional model would learn more about the structure of the language, but it's easier to train an auto aggressive model. Is that fair? Maybe at like scale, it doesn't matter anymore. Once you scale up bidirectional, you know, unidirectional, it's all okay now that we've trained on tons of data. With with there is this horrible quadratic layer wise time complexity, so they had to truncate the input size to 512. And the reason it can take 2,000 tokens as input is just because Microsoft has built like this big of a computer. That's the reason.

00:00

Transcript

Episode notes

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

2min chapter

OpenAI GPT-3: Language Models are Few-Shot Learners

Machine Learning Street Talk (MLST)

Get the Snipdpodcast app

AI-poweredpodcast player

Discoverhighlights

Save anymoment

Share& Export

AI-poweredpodcast player

Discoverhighlights

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights