2min snip

Machine Learning Street Talk (MLST) cover image

Understanding Deep Learning - Prof. SIMON PRINCE [STAFF FAVOURITE]

Machine Learning Street Talk (MLST)

NOTE

Understanding batch normalization and its impact on gradients and regularization

Batch normalization was originally introduced to address covariate shift, where adjustments to parameters in later layers rendered the parameters in earlier layers ineffective. It prevents exploding gradients and vanishing gradients in neural networks, especially in residual networks. However, it has been found that batch normalization can introduce covariance shift and has a regularization effect due to the introduction of random noise. Additionally, it can lead to data leakage and is not suitable for transformers using masked attention. As a result, layer normalization is preferred in such cases.

00:00

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode