The Real Python Podcast cover image

Moving NLP Forward With Transformer Models and Attention

The Real Python Podcast

00:00

Is It the Quantum of Actual Memory, or Thevanishing Gradient?

It's elected for a very technical reason, and i'll sort of try to e explain it in a simple way. S basically, these models, they have a lot of different layers, and you sort of do transformations of the data at each layer. These particular models were using a a transformation that sort of squashed at the data. And it meant that the deeper and deeper these models got as you sort of tried to send data legs back through in order to train, it just got so squashed that there was no o any more. It's called the vanishing gradient problem.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app