Machine Learning Street Talk (MLST) cover image

Test-Time Adaptation: the key to reasoning with DL (Mohamed Osman)

Machine Learning Street Talk (MLST)

00:00

Exploring Transformer Limitations and Architectural Fixes

This chapter examines the limitations of self-attention transformers, particularly regarding tasks like counting and copying. It discusses the impact of representational squashing and the softmax function on performance while proposing deeper architectural changes to improve AI capabilities.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app