Yannic Kilcher Videos (Audio Only)

Yannic Kilcher
undefined
Sep 21, 2021 • 32min

Topographic VAEs learn Equivariant Capsules (Machine Learning Research Paper Explained)

#tvae #topographic #equivariant Variational Autoencoders model the latent space as a set of independent Gaussian random variables, which the decoder maps to a data distribution. However, this independence is not always desired, for example when dealing with video sequences, we know that successive frames are heavily correlated. Thus, any latent space dealing with such data should reflect this in its structure. Topographic VAEs are a framework for defining correlation structures among the latent variables and induce equivariance within the resulting model. This paper shows how such correlation structures can be built by correctly arranging higher-level variables, which are themselves independent Gaussians. OUTLINE: 0:00 - Intro 1:40 - Architecture Overview 6:30 - Comparison to regular VAEs 8:35 - Generative Mechanism Formulation 11:45 - Non-Gaussian Latent Space 17:30 - Topographic Product of Student-t 21:15 - Introducing Temporal Coherence 24:50 - Topographic VAE 27:50 - Experimental Results 31:15 - Conclusion & Comments Paper: https://arxiv.org/abs/2109.01394 Code: https://github.com/akandykeller/topog... Abstract: In this work we seek to bridge the concepts of topographic organization and equivariance in neural networks. To accomplish this, we introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables. We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST. Furthermore, through topographic organization over time (i.e. temporal coherence), we demonstrate how predefined latent space transformation operators can be encouraged for observed transformed input sequences -- a primitive form of unsupervised learned equivariance. We demonstrate that this model successfully learns sets of approximately equivariant features (i.e. "capsules") directly from sequences and achieves higher likelihood on correspondingly transforming test sequences. Equivariance is verified quantitatively by measuring the approximate commutativity of the inference network and the sequence transformations. Finally, we demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks. Authors: T. Anderson Keller, Max Welling Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Sep 16, 2021 • 26min

[ML News] Roomba Avoids Poop | Textless NLP | TikTok Algorithm Secrets | New Schmidhuber Blog

#schmidhuber #tiktok #roomba Your regularly irregular update on what's happening in the world of Machine Learning. OUTLINE: 0:00 - Intro 0:15 - Sponsor: Weights & Biases 1:55 - ML YouTuber reaches 100k subscribers 2:40 - Facebook AI pushes Textless NLP 5:30 - Schmidhuber blog post: I invented everything 7:55 - TikTok algorithm rabbitholes users 10:45 - Roomba learns to avoid poop 11:50 - AI can spot art forgeries 14:55 - Deepmind's plans to separate from Google 16:15 - Cohere raises 40M 16:55 - US Judge rejects AI inventor on patent 17:55 - Altman: GPT-4 not much bigger than GPT-3 18:45 - Salesforce CodeT5 19:45 - DeepMind Reinforcement Learning Lecture Series 20:15 - WikiGraphs Dataset 20:40 - LiveCell Dataset 21:00 - SpeechBrain 21:10 - AI-generated influencer gains 100 sponsorships 22:20 - AI News Questions 23:15 - AI hiring tools reject millions of valid applicants Sponsor: Weights & Biases https://wandb.me/start References: Facebook AI creates Textless NLP https://ai.facebook.com/blog/textless... https://speechbot.github.io/pgslm/?fb... Schmidhuber invented everything https://people.idsia.ch/~juergen/most... How TikTok's algorithm works https://www.wsj.com/video/series/insi... Roomba learns to avoid poop https://edition.cnn.com/2021/09/09/te... Amateur develops fake art detector https://blogs.nvidia.com/blog/2021/08... https://spectrum.ieee.org/this-ai-can... DeepMind's plan to break away from Google https://www.businessinsider.com/deepm... https://archive.ph/8s5IK Cohere raises USD 40M https://www.fastcompany.com/90670635/... https://cohere.ai/ US judge refuses AI patent https://www.theregister.com/2021/09/0... Sam Altman on GPT-4 https://www.reddit.com/r/OpenAI/comme... Salesforce releases CodeT5 https://blog.einstein.ai/codet5/ DeepMind RL lecture series https://deepmind.com/learning-resourc... WikiGraphs Dataset https://github.com/deepmind/deepmind-... LiveCell Dataset https://sartorius-research.github.io/... https://www.nature.com/articles/s4159... SpeechBrain Library https://speechbrain.github.io/ AI generated influencer lands 100 sponsorships https://www.allkpop.com/article/2021/... AI News Questions https://www.forbes.com/sites/tomtaull... https://mindmatters.ai/2021/09/isnt-i... https://fortune.com/2021/09/07/deepmi... https://www.forbes.com/sites/anniebro... https://www.cnbctv18.com/views/view-a... https://www.kcrw.com/culture/shows/li... https://techcrunch.com/2021/09/07/ai-... https://www.forbes.com/sites/bernardm... AI hiring tools mistakenly reject millions of applicants https://www.theverge.com/2021/9/6/226... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :)
undefined
Sep 16, 2021 • 10min

Celebrating 100k Subscribers! (w/ Channel Statistics)

#yannickilcher #machinelearning #100k OUTLINE: 0:00 - 100k! 1:00 - Announcements & Thanks 3:55 - Channel Statistics Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-ki... BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Sep 13, 2021 • 28min

[ML News] AI predicts race from X-Ray | Google kills HealthStreams | Boosting Search with MuZero

#mlnews #schmidhuber #muzero Your regular updates on what's happening in the ML world! OUTLINE: 0:00 - Intro 0:15 - Sponsor: Weights & Biases 1:45 - Google shuts down health streams 4:25 - AI predicts race from blurry X-Rays 7:35 - Facebook labels black men as primates 11:05 - Distill papers on Graph Neural Networks 11:50 - Jürgen Schmidhuber to lead KAUST AI Initiative 12:35 - GitHub brief on DMCA notices for source code 14:55 - Helpful Reddit Threads 19:40 - Simple Tricks to improve Transformers 20:40 - Apple's Unconstrained Scene Generation 21:40 - Common Objects in 3D dataset 22:20 - WarpDrive Multi-Agent RL framework 23:10 - My new paper: Boosting Search Agents & MuZero 25:15 - Can AI detect depression from speech? References: Google shuts down Health Streams https://techcrunch.com/2021/08/26/goo... AI predicts race from X-Rays https://www.iflscience.com/technology... https://arxiv.org/ftp/arxiv/papers/21... Facebook labels black men as primates https://www.nytimes.com/2021/09/03/te... https://en.wikipedia.org/wiki/Human Distill articles on GNNs https://distill.pub/2021/gnn-intro/ https://distill.pub/2021/understandin... Jürgen Schmidhuber leads KAUST AI initiative https://people.idsia.ch/~juergen/kaus... GitHub issues court brief on code DMCAs https://github.blog/2021-08-31-vague-... Useful Reddit Threads https://www.reddit.com/r/MachineLearn... https://www.reddit.com/r/MachineLearn... https://www.reddit.com/r/MachineLearn... https://www.reddit.com/r/MachineLearn... Tricks to improve Transformers https://arxiv.org/pdf/2108.12284.pdf Unconstrained Scene Generation https://apple.github.io/ml-gsn/ Common Objects in 3D dataset https://ai.facebook.com/blog/common-o... WarpDrive Multi-Agent RL framework https://blog.einstein.ai/warpdrive-fa... Boosting Search Engines / MuZero Code https://arxiv.org/abs/2109.00527 https://github.com/google-research/go... https://github.com/google-research/la... Can AI detect depression? https://venturebeat.com/2021/08/31/ai... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-ki... BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Sep 6, 2021 • 37min

∞-former: Infinite Memory Transformer (aka Infty-Former / Infinity-Former, Research Paper Explained)

#inftyformer #infinityformer #transformer Vanilla Transformers are excellent sequence models, but suffer from very harsch constraints on the length of the sequences they can process. Several attempts have been made to extend the Transformer's sequence length, but few have successfully gone beyond a constant factor improvement. This paper presents a method, based on continuous attention mechanisms, to attend to an unbounded past sequence by representing the past as a continuous signal, rather than a sequence. This enables the Infty-Former to effectively enrich the current context with global information, which increases performance on long-range dependencies in sequence tasks. Further, the paper presents the concept of sticky memories, which highlight past events that are of particular importance and elevates their representation in the long-term memory. OUTLINE: 0:00 - Intro & Overview 1:10 - Sponsor Spot: Weights & Biases 3:35 - Problem Statement 8:00 - Continuous Attention Mechanism 16:25 - Unbounded Memory via concatenation & contraction 18:05 - Does this make sense? 20:25 - How the Long-Term Memory is used in an attention layer 27:40 - Entire Architecture Recap 29:30 - Sticky Memories by Importance Sampling 31:25 - Commentary: Pros and cons of using heuristics 32:30 - Experiments & Results Paper: https://arxiv.org/abs/2109.00301 Sponsor: Weights & Biases https://wandb.me/start Abstract: Transformers struggle when attending to long contexts, since the amount of computation grows with the context length, and therefore they cannot model long-term memories effectively. Several variations have been proposed to alleviate this problem, but they all have a finite memory capacity, being forced to drop old information. In this paper, we propose the ∞-former, which extends the vanilla transformer with an unbounded long-term memory. By making use of a continuous-space attention mechanism to attend over the long-term memory, the ∞-former's attention complexity becomes independent of the context length. Thus, it is able to model arbitrarily long contexts and maintain "sticky memories" while keeping a fixed computation budget. Experiments on a synthetic sorting task demonstrate the ability of the ∞-former to retain information from long sequences. We also perform experiments on language modeling, by training a model from scratch and by fine-tuning a pre-trained language model, which show benefits of unbounded long-term memories. Authors: Pedro Henrique Martins, Zita Marinho, André F. T. Martins Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-ki... BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Sep 5, 2021 • 17min

[ML News] Blind Chess AI Competition | Graph NNs for traffic | AI gift suggestions

#mlnews #chess #neurips OUTLINE: 0:00 - Intro 0:30 - Reconnaissance Blind Chess NeurIPS 2021 Competition 3:40 - Colab Pro no longer top priority for GPUs 4:45 - DeepMind uses Graph NNs to do traffic prediction 6:00 - Helpful Libraries: Isaac Gym, Differentiable Human, LVIS, BEHAVIOR 10:25 - Cerebras Wafer Scale Engine Cluster 12:15 - AI Voice Synthesis for Val Kilmer 14:20 - Can AI give thoughtful gifts? References: Reconnaissance Blind Chess NeurIPS 2021 Competition https://rbc.jhuapl.edu/ https://rbc.jhuapl.edu/gameRules Colab Pro no longer top priority https://www.reddit.com/r/MachineLearn... Google Maps ETA prediction using Graph Neural Networks https://arxiv.org/pdf/2108.11482.pdf Isaac Gym: RL simulator on GPU https://arxiv.org/abs/2108.10470 https://sites.google.com/view/isaacgy... https://developer.nvidia.com/isaac-gym Cerebras Cluster for massive AI models https://www.wired.com/story/cerebras-... Helpful Libraries / Datasets https://nimblephysics.org/docs/human-... https://www.lvisdataset.org/ https://arxiv.org/pdf/2108.03332.pdf AI Voice Reconstruction https://www.washingtonpost.com/techno... Can AI make thoughtful gifts? https://www.forbes.com/sites/anniebro... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-ki... BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Sep 5, 2021 • 31min

ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolation

#alibi #transformers #attention Transformers are essentially set models that need additional inputs to make sense of sequence data. The most widespread additional inputs are position encodings or position embeddings, which add sequence index information in various forms. However, this has put a limit on the resulting model, which cannot run inference on sequences longer than it has been trained on, as it would encounter unfamiliar position encodings. ALiBi solves this by proposing simple linear fixed biases as position information, adding negligible overhead in time and memory, but surprisingly, the resulting model is able to handle inference on sequences many times as long as its training sequences. OUTLINE: 0:00 - Intro & Overview 1:40 - Position Encodings in Transformers 4:55 - Sinusoidial Position Encodings 11:50 - ALiBi Position Encodings 20:50 - How to choose the slope parameter 23:55 - Experimental Results 29:10 - Comments & Conclusion Paper: https://ofir.io/train_short_test_long... Code: https://github.com/ofirpress/attentio... Abstract: Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question remains open: how to achieve extrapolation at inference time to longer sequences than seen during training? We first show that extrapolation can be improved by changing the position representation method, though we find that existing proposals do not allow efficient extrapolation. We introduce a simple and efficient method, Attention with Linear Biases (ALiBi), that allows for extrapolation. ALiBi does not add positional embeddings to the word embeddings; instead, it biases the query-key attention scores with a term that is proportional to their distance. We show that this method allows training a 1.3 billion parameter model on input sequences of length 1024 that extrapolates to input sequences of length 2048, achieving the same perplexity as a sinusoidal position embedding model trained on inputs of length 2048, 11% faster and using 11% less memory. ALiBi’s inductive bias towards recency allows it to outperform multiple strong position methods on the WikiText-103 benchmark. Finally, we provide analysis of ALiBi to understand why it leads to better performance. Authors: Ofir Press, Noah A. Smith, Mike Lewis Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-ki... BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Aug 30, 2021 • 33min

[ML News] Stanford HAI coins Foundation Models & High-profile case of plagiarism uncovered

#plagiarism #foundationmodels #tesla The best place to keep up to date with the latest and greatest from the ML world! OUTLINE: 0:00 - Intro & Sponsor 3:15 - A high-profile case of plagiarism shocks the ML world 11:55 - Stanford AI releases paper on "Foundation Models" 19:45 - Updates on Apple's NeuralHash 20:45 - RL control for two-player splorts 21:45 - Tesla's AI Day 23:55 - COMMA THREE announced 24:40 - Intel winding down RealSense cameras 25:20 - IBM unveils Telum Processor 25:50 - Lux AI Challenge & Neural MMO Challenge 26:50 - Dribnet's CLIP PixelArt 27:40 - Multi-Agent RL papers are mostly fake 28:50 - I can't even come up with a segment title 29:25 - AI News Questions 31:20 - Frameworks & Libraries Sponsor: Weights & Biases https://wandb.ai References: Plagiarism case shocks ML world https://arxiv.org/abs/2102.07870v1 https://arxiv.org/pdf/2102.07870v1.pdf https://arxiv.org/abs/2108.05862 https://arxiv.org/pdf/2108.05862v1.pdf https://www.reddit.com/r/MachineLearn... https://michaelsdr.github.io/momentum... https://www.zhihu.com/question/480075... https://zhuanlan.zhihu.com/p/40035196... https://finance.sina.com.cn/tech/2021... https://duoli.org/ https://web.archive.org/web/202108160... https://twitter.com/shaohua0116/statu... Stanford AI targets Foundation Models https://arxiv.org/abs/2108.07258 https://arxiv.org/pdf/2108.07258.pdf https://ieeexplore.ieee.org/document/... https://xgboost.readthedocs.io/en/lat... https://en.wikipedia.org/wiki/Support... https://scikit-learn.org/stable/modul... https://syncedreview.com/2019/06/27/t... https://openai.com/blog/better-langua... NeuralHash Saga Continues https://www.reddit.com/r/MachineLearn... https://blog.roboflow.com/neuralhash-... https://www.kron4.com/news/bay-area/b... RL Control for competitive sports https://ai.facebook.com/research/publ... Tesla AI Day https://www.youtube.com/watch?v=ABbDB... https://spectrum.ieee.org/elon-musk-r... https://www.youtube.com/watch?v=j0z4F... George Hotz announces COMMA THREE https://www.youtube.com/watch?v=jJn2O... https://comma.ai/shop/products/three Intel abandons RealSense cameras https://www.crn.com/news/components-p... IBM unveils Telum Processor https://www.prnewswire.com/news-relea... Kaggle Lux AI challenge https://www.kaggle.com/c/lux-ai-2021 Neural MMO challenge https://www.aicrowd.com/challenges/th... Dribnet's PixelArt https://twitter.com/dribnet/status/14... Multi-Agent RL papers mostly fake https://www.reddit.com/r/reinforcemen... Elon Musk, Lex Fridman tweets trigger news story https://www.benzinga.com/news/21/08/2... News Questions: https://www.zdnet.com/article/can-ai-... https://entertainment.inquirer.net/41... https://www.analyticsinsight.net/whic... https://www.bbc.co.uk/programmes/m000... https://ricochet.com/podcast/cosm-tec... https://www.designnews.com/automation... https://www.forbes.com/sites/anniebro... 3D Volleyball RL environment https://www.reddit.com/r/MachineLearn... Maze RL framework https://enliteai.medium.com/maze-appl... Wanderer 2 HN Search https://metaphor.so/
undefined
Aug 27, 2021 • 35min

Fastformer: Additive Attention Can Be All You Need (Machine Learning Research Paper Explained)

#attention #transformer #fastformer Transformers have become the dominant model class in the last few years for large data, but their quadratic complexity in terms of sequence length has plagued them until now. Fastformer claims to be the fastest and most performant linear attention variant, able to consume long contexts at once. This is achieved by a combination of additive attention and elementwise products. While initial results look promising, I have my reservations... OUTLINE: 0:00 - Intro & Outline 2:15 - Fastformer description 5:20 - Baseline: Classic Attention 10:00 - Fastformer architecture 12:50 - Additive Attention 18:05 - Query-Key element-wise multiplication 21:35 - Redundant modules in Fastformer 25:00 - Problems with the architecture 27:30 - Is this even attention? 32:20 - Experimental Results 34:50 - Conclusion & Comments Paper: https://arxiv.org/abs/2108.09084 Abstract: Transformer is a powerful model for text understanding. However, it is inefficient due to its quadratic complexity to input sequence length. Although there are many methods on Transformer acceleration, they are still either inefficient on long sequences or not effective enough. In this paper, we propose Fastformer, which is an efficient Transformer model based on additive attention. In Fastformer, instead of modeling the pair-wise interactions between tokens, we first use additive attention mechanism to model global contexts, and then further transform each token representation based on its interaction with global context representations. In this way, Fastformer can achieve effective context modeling with linear complexity. Extensive experiments on five datasets show that Fastformer is much more efficient than many existing Transformer models and can meanwhile achieve comparable or even better long text modeling performance. Authors: Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-ki... BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Aug 23, 2021 • 44min

PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)

#pondernet #deepmind #machinelearning Humans don't spend the same amount of mental effort on all problems equally. Instead, we respond quickly to easy tasks, and we take our time to deliberate hard tasks. DeepMind's PonderNet attempts to achieve the same by dynamically deciding how many computation steps to allocate to any single input sample. This is done via a recurrent architecture and a trainable function that computes a halting probability. The resulting model performs well in dynamic computation tasks and is surprisingly robust to different hyperparameter settings. OUTLINE: 0:00 - Intro & Overview 2:30 - Problem Statement 8:00 - Probabilistic formulation of dynamic halting 14:40 - Training via unrolling 22:30 - Loss function and regularization of the halting distribution 27:35 - Experimental Results 37:10 - Sensitivity to hyperparameter choice 41:15 - Discussion, Conclusion, Broader Impact Paper: https://arxiv.org/abs/2107.05407 Abstract: In standard neural networks the amount of computation used grows with the size of the inputs, but not with the complexity of the problem being learnt. To overcome this limitation we introduce PonderNet, a new algorithm that learns to adapt the amount of computation based on the complexity of the problem at hand. PonderNet learns end-to-end the number of computational steps to achieve an effective compromise between training prediction accuracy, computational cost and generalization. On a complex synthetic problem, PonderNet dramatically improves performance over previous adaptive computation methods and additionally succeeds at extrapolation tests where traditional neural networks fail. Also, our method matched the current state of the art results on a real world question and answering dataset, but using less compute. Finally, PonderNet reached state of the art results on a complex task designed to test the reasoning capabilities of neural networks.1 Authors: Andrea Banino, Jan Balaguer, Charles Blundell Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-ki... BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app