129 - Transformers and Hierarchical Structure, with Shunyu Yao

1

Introduction

00:00 • 3min

2

Canavera, Col., Is That Right?

02:32 • 3min

3

Is There a Connection Between the Former Languages and the Shomsky Hiry Kind of Stuff?

05:05 • 2min

4

How to Characterize Inducto Bias

07:01 • 2min

5

A, Ye Oge, So Let's Look at the Haraco Structure of Natural Language

08:52 • 2min

6

How to Do This in Outcast?

10:30 • 2min

7

Is There a Bounded Depth on the Stack?

12:03 • 2min

8

The Limitation of Unbounded Depth Transformers

13:56 • 2min

9

How Much Memory Do You Need to Process This Dyke Language?

15:43 • 2min

10

Is There a Difference in the Precision?

17:14 • 3min

11

The Differences Between Recurrent Mechanism and Self Attention Mechanism

20:20 • 3min

12

Aranan, I Agree With Everything.

22:53 • 4min

13

The Intuition of the Self Attention Network

26:33 • 3min

14

The Scaler Position Incodi, Is Important for Former Languages?

29:30 • 3min

15

Generalization to Longer Sequence Lents?

32:12 • 1min

16

Why Isn't the Absolute Position Incoding So Important?

33:41 • 2min