36 - Attention Is All You Need, with Ashish Vaswani and Jakob Uszkoreit

Oct 23, 2017

Ashish Vaswani and Jakob Uszkoreit, co-authors of the "Attention Is All You Need" paper, discuss the motivation behind replacing RNNs and CNNs with a self-attention mechanism in the Transformer model. They delve into topics such as the positional encoding mechanism, multi-headed attention, replacing encoders in other models, and what self-attention actually learns. They highlight how lower layers learn n-grams and higher layers learn coreference, showcasing the power of the self-attention mechanism.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 6min

Self-Attention Mechanism for Parallelization and Dependency Connections

05:45 • 18min

Exploring the Working of a Model and the Importance of Sinusoids in Embeddings

23:15 • 2min

Discussion on Hypotheses, Substitutions, and Dependencies in the Learning Process

25:08 • 2min

Exploring a New Encoder for Various Tasks

27:20 • 3min

Using Attention Mechanisms in Natural Language Processing

30:50 • 10min