

Yannic Kilcher Videos (Audio Only)
Yannic Kilcher
I make videos about machine learning research papers, programming, and issues of the AI community, and the broader impact of AI in society.
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar (preferred to Patreon): https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar (preferred to Patreon): https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Episodes
Mentioned books

Sep 21, 2021 • 32min
Topographic VAEs learn Equivariant Capsules (Machine Learning Research Paper Explained)
#tvae #topographic #equivariant
Variational Autoencoders model the latent space as a set of independent Gaussian random variables, which the decoder maps to a data distribution. However, this independence is not always desired, for example when dealing with video sequences, we know that successive frames are heavily correlated. Thus, any latent space dealing with such data should reflect this in its structure. Topographic VAEs are a framework for defining correlation structures among the latent variables and induce equivariance within the resulting model. This paper shows how such correlation structures can be built by correctly arranging higher-level variables, which are themselves independent Gaussians.
OUTLINE:
0:00 - Intro
1:40 - Architecture Overview
6:30 - Comparison to regular VAEs
8:35 - Generative Mechanism Formulation
11:45 - Non-Gaussian Latent Space
17:30 - Topographic Product of Student-t
21:15 - Introducing Temporal Coherence
24:50 - Topographic VAE
27:50 - Experimental Results
31:15 - Conclusion & Comments
Paper: https://arxiv.org/abs/2109.01394
Code: https://github.com/akandykeller/topog...
Abstract:
In this work we seek to bridge the concepts of topographic organization and equivariance in neural networks. To accomplish this, we introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables. We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST. Furthermore, through topographic organization over time (i.e. temporal coherence), we demonstrate how predefined latent space transformation operators can be encouraged for observed transformed input sequences -- a primitive form of unsupervised learned equivariance. We demonstrate that this model successfully learns sets of approximately equivariant features (i.e. "capsules") directly from sequences and achieves higher likelihood on correspondingly transforming test sequences. Equivariance is verified quantitatively by measuring the approximate commutativity of the inference network and the sequence transformations. Finally, we demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
Authors: T. Anderson Keller, Max Welling
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Sep 16, 2021 • 26min
[ML News] Roomba Avoids Poop | Textless NLP | TikTok Algorithm Secrets | New Schmidhuber Blog
#schmidhuber #tiktok #roomba
Your regularly irregular update on what's happening in the world of Machine Learning.
OUTLINE:
0:00 - Intro
0:15 - Sponsor: Weights & Biases
1:55 - ML YouTuber reaches 100k subscribers
2:40 - Facebook AI pushes Textless NLP
5:30 - Schmidhuber blog post: I invented everything
7:55 - TikTok algorithm rabbitholes users
10:45 - Roomba learns to avoid poop
11:50 - AI can spot art forgeries
14:55 - Deepmind's plans to separate from Google
16:15 - Cohere raises 40M
16:55 - US Judge rejects AI inventor on patent
17:55 - Altman: GPT-4 not much bigger than GPT-3
18:45 - Salesforce CodeT5
19:45 - DeepMind Reinforcement Learning Lecture Series
20:15 - WikiGraphs Dataset
20:40 - LiveCell Dataset
21:00 - SpeechBrain
21:10 - AI-generated influencer gains 100 sponsorships
22:20 - AI News Questions
23:15 - AI hiring tools reject millions of valid applicants
Sponsor: Weights & Biases
https://wandb.me/start
References:
Facebook AI creates Textless NLP
https://ai.facebook.com/blog/textless...
https://speechbot.github.io/pgslm/?fb...
Schmidhuber invented everything
https://people.idsia.ch/~juergen/most...
How TikTok's algorithm works
https://www.wsj.com/video/series/insi...
Roomba learns to avoid poop
https://edition.cnn.com/2021/09/09/te...
Amateur develops fake art detector
https://blogs.nvidia.com/blog/2021/08...
https://spectrum.ieee.org/this-ai-can...
DeepMind's plan to break away from Google
https://www.businessinsider.com/deepm...
https://archive.ph/8s5IK
Cohere raises USD 40M
https://www.fastcompany.com/90670635/...
https://cohere.ai/
US judge refuses AI patent
https://www.theregister.com/2021/09/0...
Sam Altman on GPT-4
https://www.reddit.com/r/OpenAI/comme...
Salesforce releases CodeT5
https://blog.einstein.ai/codet5/
DeepMind RL lecture series
https://deepmind.com/learning-resourc...
WikiGraphs Dataset
https://github.com/deepmind/deepmind-...
LiveCell Dataset
https://sartorius-research.github.io/...
https://www.nature.com/articles/s4159...
SpeechBrain Library
https://speechbrain.github.io/
AI generated influencer lands 100 sponsorships
https://www.allkpop.com/article/2021/...
AI News Questions
https://www.forbes.com/sites/tomtaull...
https://mindmatters.ai/2021/09/isnt-i...
https://fortune.com/2021/09/07/deepmi...
https://www.forbes.com/sites/anniebro...
https://www.cnbctv18.com/views/view-a...
https://www.kcrw.com/culture/shows/li...
https://techcrunch.com/2021/09/07/ai-...
https://www.forbes.com/sites/bernardm...
AI hiring tools mistakenly reject millions of applicants
https://www.theverge.com/2021/9/6/226...
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)

Sep 16, 2021 • 10min
Celebrating 100k Subscribers! (w/ Channel Statistics)
#yannickilcher #machinelearning #100k
OUTLINE:
0:00 - 100k!
1:00 - Announcements & Thanks
3:55 - Channel Statistics
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-ki...
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Sep 13, 2021 • 28min
[ML News] AI predicts race from X-Ray | Google kills HealthStreams | Boosting Search with MuZero
#mlnews #schmidhuber #muzero
Your regular updates on what's happening in the ML world!
OUTLINE:
0:00 - Intro
0:15 - Sponsor: Weights & Biases
1:45 - Google shuts down health streams
4:25 - AI predicts race from blurry X-Rays
7:35 - Facebook labels black men as primates
11:05 - Distill papers on Graph Neural Networks
11:50 - Jürgen Schmidhuber to lead KAUST AI Initiative
12:35 - GitHub brief on DMCA notices for source code
14:55 - Helpful Reddit Threads
19:40 - Simple Tricks to improve Transformers
20:40 - Apple's Unconstrained Scene Generation
21:40 - Common Objects in 3D dataset
22:20 - WarpDrive Multi-Agent RL framework
23:10 - My new paper: Boosting Search Agents & MuZero
25:15 - Can AI detect depression from speech?
References:
Google shuts down Health Streams
https://techcrunch.com/2021/08/26/goo...
AI predicts race from X-Rays
https://www.iflscience.com/technology...
https://arxiv.org/ftp/arxiv/papers/21...
Facebook labels black men as primates
https://www.nytimes.com/2021/09/03/te...
https://en.wikipedia.org/wiki/Human
Distill articles on GNNs
https://distill.pub/2021/gnn-intro/
https://distill.pub/2021/understandin...
Jürgen Schmidhuber leads KAUST AI initiative
https://people.idsia.ch/~juergen/kaus...
GitHub issues court brief on code DMCAs
https://github.blog/2021-08-31-vague-...
Useful Reddit Threads
https://www.reddit.com/r/MachineLearn...
https://www.reddit.com/r/MachineLearn...
https://www.reddit.com/r/MachineLearn...
https://www.reddit.com/r/MachineLearn...
Tricks to improve Transformers
https://arxiv.org/pdf/2108.12284.pdf
Unconstrained Scene Generation
https://apple.github.io/ml-gsn/
Common Objects in 3D dataset
https://ai.facebook.com/blog/common-o...
WarpDrive Multi-Agent RL framework
https://blog.einstein.ai/warpdrive-fa...
Boosting Search Engines / MuZero Code
https://arxiv.org/abs/2109.00527
https://github.com/google-research/go...
https://github.com/google-research/la...
Can AI detect depression?
https://venturebeat.com/2021/08/31/ai...
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-ki...
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Sep 6, 2021 • 37min
∞-former: Infinite Memory Transformer (aka Infty-Former / Infinity-Former, Research Paper Explained)
#inftyformer #infinityformer #transformer
Vanilla Transformers are excellent sequence models, but suffer from very harsch constraints on the length of the sequences they can process. Several attempts have been made to extend the Transformer's sequence length, but few have successfully gone beyond a constant factor improvement. This paper presents a method, based on continuous attention mechanisms, to attend to an unbounded past sequence by representing the past as a continuous signal, rather than a sequence. This enables the Infty-Former to effectively enrich the current context with global information, which increases performance on long-range dependencies in sequence tasks. Further, the paper presents the concept of sticky memories, which highlight past events that are of particular importance and elevates their representation in the long-term memory.
OUTLINE:
0:00 - Intro & Overview
1:10 - Sponsor Spot: Weights & Biases
3:35 - Problem Statement
8:00 - Continuous Attention Mechanism
16:25 - Unbounded Memory via concatenation & contraction
18:05 - Does this make sense?
20:25 - How the Long-Term Memory is used in an attention layer
27:40 - Entire Architecture Recap
29:30 - Sticky Memories by Importance Sampling
31:25 - Commentary: Pros and cons of using heuristics
32:30 - Experiments & Results
Paper: https://arxiv.org/abs/2109.00301
Sponsor: Weights & Biases
https://wandb.me/start
Abstract:
Transformers struggle when attending to long contexts, since the amount of computation grows with the context length, and therefore they cannot model long-term memories effectively. Several variations have been proposed to alleviate this problem, but they all have a finite memory capacity, being forced to drop old information. In this paper, we propose the ∞-former, which extends the vanilla transformer with an unbounded long-term memory. By making use of a continuous-space attention mechanism to attend over the long-term memory, the ∞-former's attention complexity becomes independent of the context length. Thus, it is able to model arbitrarily long contexts and maintain "sticky memories" while keeping a fixed computation budget. Experiments on a synthetic sorting task demonstrate the ability of the ∞-former to retain information from long sequences. We also perform experiments on language modeling, by training a model from scratch and by fine-tuning a pre-trained language model, which show benefits of unbounded long-term memories.
Authors: Pedro Henrique Martins, Zita Marinho, André F. T. Martins
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-ki...
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Sep 5, 2021 • 17min
[ML News] Blind Chess AI Competition | Graph NNs for traffic | AI gift suggestions
#mlnews #chess #neurips
OUTLINE:
0:00 - Intro
0:30 - Reconnaissance Blind Chess NeurIPS 2021 Competition
3:40 - Colab Pro no longer top priority for GPUs
4:45 - DeepMind uses Graph NNs to do traffic prediction
6:00 - Helpful Libraries: Isaac Gym, Differentiable Human, LVIS, BEHAVIOR
10:25 - Cerebras Wafer Scale Engine Cluster
12:15 - AI Voice Synthesis for Val Kilmer
14:20 - Can AI give thoughtful gifts?
References:
Reconnaissance Blind Chess NeurIPS 2021 Competition
https://rbc.jhuapl.edu/
https://rbc.jhuapl.edu/gameRules
Colab Pro no longer top priority
https://www.reddit.com/r/MachineLearn...
Google Maps ETA prediction using Graph Neural Networks
https://arxiv.org/pdf/2108.11482.pdf
Isaac Gym: RL simulator on GPU
https://arxiv.org/abs/2108.10470
https://sites.google.com/view/isaacgy...
https://developer.nvidia.com/isaac-gym
Cerebras Cluster for massive AI models
https://www.wired.com/story/cerebras-...
Helpful Libraries / Datasets
https://nimblephysics.org/docs/human-...
https://www.lvisdataset.org/
https://arxiv.org/pdf/2108.03332.pdf
AI Voice Reconstruction
https://www.washingtonpost.com/techno...
Can AI make thoughtful gifts?
https://www.forbes.com/sites/anniebro...
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-ki...
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Sep 5, 2021 • 31min
ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolation
#alibi #transformers #attention
Transformers are essentially set models that need additional inputs to make sense of sequence data. The most widespread additional inputs are position encodings or position embeddings, which add sequence index information in various forms. However, this has put a limit on the resulting model, which cannot run inference on sequences longer than it has been trained on, as it would encounter unfamiliar position encodings. ALiBi solves this by proposing simple linear fixed biases as position information, adding negligible overhead in time and memory, but surprisingly, the resulting model is able to handle inference on sequences many times as long as its training sequences.
OUTLINE:
0:00 - Intro & Overview
1:40 - Position Encodings in Transformers
4:55 - Sinusoidial Position Encodings
11:50 - ALiBi Position Encodings
20:50 - How to choose the slope parameter
23:55 - Experimental Results
29:10 - Comments & Conclusion
Paper: https://ofir.io/train_short_test_long...
Code: https://github.com/ofirpress/attentio...
Abstract:
Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question remains open: how to achieve extrapolation at inference time to longer sequences than seen during training? We first show that extrapolation can be improved by changing the position representation method, though we find that existing proposals do not allow efficient extrapolation. We introduce a simple and efficient method, Attention with Linear Biases (ALiBi), that allows for extrapolation. ALiBi does not add positional embeddings to the word embeddings; instead, it biases the query-key attention scores with a term that is proportional to their distance. We show that this method allows training a 1.3 billion parameter model on input sequences of length 1024 that extrapolates to input sequences of length 2048, achieving the same perplexity as a sinusoidal position embedding model trained on inputs of length 2048, 11% faster and using 11% less memory. ALiBi’s inductive bias towards recency allows it to outperform multiple strong position methods on the WikiText-103 benchmark. Finally, we provide analysis of ALiBi to understand why it leads to better performance.
Authors: Ofir Press, Noah A. Smith, Mike Lewis
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-ki...
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Aug 30, 2021 • 33min
[ML News] Stanford HAI coins Foundation Models & High-profile case of plagiarism uncovered
#plagiarism #foundationmodels #tesla
The best place to keep up to date with the latest and greatest from the ML world!
OUTLINE:
0:00 - Intro & Sponsor
3:15 - A high-profile case of plagiarism shocks the ML world
11:55 - Stanford AI releases paper on "Foundation Models"
19:45 - Updates on Apple's NeuralHash
20:45 - RL control for two-player splorts
21:45 - Tesla's AI Day
23:55 - COMMA THREE announced
24:40 - Intel winding down RealSense cameras
25:20 - IBM unveils Telum Processor
25:50 - Lux AI Challenge & Neural MMO Challenge
26:50 - Dribnet's CLIP PixelArt
27:40 - Multi-Agent RL papers are mostly fake
28:50 - I can't even come up with a segment title
29:25 - AI News Questions
31:20 - Frameworks & Libraries
Sponsor: Weights & Biases
https://wandb.ai
References:
Plagiarism case shocks ML world
https://arxiv.org/abs/2102.07870v1
https://arxiv.org/pdf/2102.07870v1.pdf
https://arxiv.org/abs/2108.05862
https://arxiv.org/pdf/2108.05862v1.pdf
https://www.reddit.com/r/MachineLearn...
https://michaelsdr.github.io/momentum...
https://www.zhihu.com/question/480075...
https://zhuanlan.zhihu.com/p/40035196...
https://finance.sina.com.cn/tech/2021...
https://duoli.org/
https://web.archive.org/web/202108160...
https://twitter.com/shaohua0116/statu...
Stanford AI targets Foundation Models
https://arxiv.org/abs/2108.07258
https://arxiv.org/pdf/2108.07258.pdf
https://ieeexplore.ieee.org/document/...
https://xgboost.readthedocs.io/en/lat...
https://en.wikipedia.org/wiki/Support...
https://scikit-learn.org/stable/modul...
https://syncedreview.com/2019/06/27/t...
https://openai.com/blog/better-langua...
NeuralHash Saga Continues
https://www.reddit.com/r/MachineLearn...
https://blog.roboflow.com/neuralhash-...
https://www.kron4.com/news/bay-area/b...
RL Control for competitive sports
https://ai.facebook.com/research/publ...
Tesla AI Day
https://www.youtube.com/watch?v=ABbDB...
https://spectrum.ieee.org/elon-musk-r...
https://www.youtube.com/watch?v=j0z4F...
George Hotz announces COMMA THREE
https://www.youtube.com/watch?v=jJn2O...
https://comma.ai/shop/products/three
Intel abandons RealSense cameras
https://www.crn.com/news/components-p...
IBM unveils Telum Processor
https://www.prnewswire.com/news-relea...
Kaggle Lux AI challenge
https://www.kaggle.com/c/lux-ai-2021
Neural MMO challenge
https://www.aicrowd.com/challenges/th...
Dribnet's PixelArt
https://twitter.com/dribnet/status/14...
Multi-Agent RL papers mostly fake
https://www.reddit.com/r/reinforcemen...
Elon Musk, Lex Fridman tweets trigger news story
https://www.benzinga.com/news/21/08/2...
News Questions:
https://www.zdnet.com/article/can-ai-...
https://entertainment.inquirer.net/41...
https://www.analyticsinsight.net/whic...
https://www.bbc.co.uk/programmes/m000...
https://ricochet.com/podcast/cosm-tec...
https://www.designnews.com/automation...
https://www.forbes.com/sites/anniebro...
3D Volleyball RL environment
https://www.reddit.com/r/MachineLearn...
Maze RL framework
https://enliteai.medium.com/maze-appl...
Wanderer 2 HN Search
https://metaphor.so/

Aug 27, 2021 • 35min
Fastformer: Additive Attention Can Be All You Need (Machine Learning Research Paper Explained)
#attention #transformer #fastformer
Transformers have become the dominant model class in the last few years for large data, but their quadratic complexity in terms of sequence length has plagued them until now. Fastformer claims to be the fastest and most performant linear attention variant, able to consume long contexts at once. This is achieved by a combination of additive attention and elementwise products. While initial results look promising, I have my reservations...
OUTLINE:
0:00 - Intro & Outline
2:15 - Fastformer description
5:20 - Baseline: Classic Attention
10:00 - Fastformer architecture
12:50 - Additive Attention
18:05 - Query-Key element-wise multiplication
21:35 - Redundant modules in Fastformer
25:00 - Problems with the architecture
27:30 - Is this even attention?
32:20 - Experimental Results
34:50 - Conclusion & Comments
Paper: https://arxiv.org/abs/2108.09084
Abstract:
Transformer is a powerful model for text understanding. However, it is inefficient due to its quadratic complexity to input sequence length. Although there are many methods on Transformer acceleration, they are still either inefficient on long sequences or not effective enough. In this paper, we propose Fastformer, which is an efficient Transformer model based on additive attention. In Fastformer, instead of modeling the pair-wise interactions between tokens, we first use additive attention mechanism to model global contexts, and then further transform each token representation based on its interaction with global context representations. In this way, Fastformer can achieve effective context modeling with linear complexity. Extensive experiments on five datasets show that Fastformer is much more efficient than many existing Transformer models and can meanwhile achieve comparable or even better long text modeling performance.
Authors: Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-ki...
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Aug 23, 2021 • 44min
PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)
#pondernet #deepmind #machinelearning
Humans don't spend the same amount of mental effort on all problems equally. Instead, we respond quickly to easy tasks, and we take our time to deliberate hard tasks. DeepMind's PonderNet attempts to achieve the same by dynamically deciding how many computation steps to allocate to any single input sample. This is done via a recurrent architecture and a trainable function that computes a halting probability. The resulting model performs well in dynamic computation tasks and is surprisingly robust to different hyperparameter settings.
OUTLINE:
0:00 - Intro & Overview
2:30 - Problem Statement
8:00 - Probabilistic formulation of dynamic halting
14:40 - Training via unrolling
22:30 - Loss function and regularization of the halting distribution
27:35 - Experimental Results
37:10 - Sensitivity to hyperparameter choice
41:15 - Discussion, Conclusion, Broader Impact
Paper: https://arxiv.org/abs/2107.05407
Abstract:
In standard neural networks the amount of computation used grows with the size of the inputs, but not with the complexity of the problem being learnt. To overcome this limitation we introduce PonderNet, a new algorithm that learns to adapt the amount of computation based on the complexity of the problem at hand. PonderNet learns end-to-end the number of computational steps to achieve an effective compromise between training prediction accuracy, computational cost and generalization. On a complex synthetic problem, PonderNet dramatically improves performance over previous adaptive computation methods and additionally succeeds at extrapolation tests where traditional neural networks fail. Also, our method matched the current state of the art results on a real world question and answering dataset, but using less compute. Finally, PonderNet reached state of the art results on a complex task designed to test the reasoning capabilities of neural networks.1
Authors: Andrea Banino, Jan Balaguer, Charles Blundell
Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yann...
Minds: https://www.minds.com/ykilcher
Parler: https://parler.com/profile/YannicKilcher
LinkedIn: https://www.linkedin.com/in/yannic-ki...
BiliBili: https://space.bilibili.com/1824646584
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n