Yannic Kilcher Videos (Audio Only)

Yannic Kilcher

I make videos about machine learning research papers, programming, and issues of the AI community, and the broader impact of AI in society.

Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar (preferred to Patreon): https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

Episodes

Mentioned books

Feb 21, 2022 • 1h 3min

All about AI Accelerators: GPU, TPU, Dataflow, Near-Memory, Optical, Neuromorphic & more (w/ Author)

#ai #gpu #tpu This video is an interview with Adi Fuchs, author of a series called "AI Accelerators", and an expert in modern AI acceleration technology. Accelerators like GPUs and TPUs are an integral part of today's AI landscape. Deep Neural Network training can be sped up by orders of magnitudes by making good use of these specialized pieces of hardware. However, GPUs and TPUs are only the beginning of a vast landscape of emerging technologies and companies that build accelerators for the next generation of AI models. In this interview, we go over many aspects of building hardware for AI, including why GPUs have been so successful, what the most promising approaches look like, how they work, and what the main challenges are. OUTLINE: 0:00 - Intro 5:10 - What does it mean to make hardware for AI? 8:20 - Why were GPUs so successful? 16:25 - What is "dark silicon"? 20:00 - Beyond GPUs: How can we get even faster AI compute? 28:00 - A look at today's accelerator landscape 30:00 - Systolic Arrays and VLIW 35:30 - Reconfigurable dataflow hardware 40:50 - The failure of Wave Computing 42:30 - What is near-memory compute? 46:50 - Optical and Neuromorphic Computing 49:50 - Hardware as enabler and limiter 55:20 - Everything old is new again 1:00:00 - Where to go to dive deeper? Read the full blog series here: Part I: https://medium.com/@adi.fu7/ai-accele... Part II: https://medium.com/@adi.fu7/ai-accele... Part III: https://medium.com/@adi.fu7/ai-accele... Part IV: https://medium.com/@adi.fu7/ai-accele... Part V: https://medium.com/@adi.fu7/ai-accele... Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Feb 21, 2022 • 1h 24min

CM3: A Causal Masked Multimodal Model of the Internet (Paper Explained w/ Author Interview)

#cm3 #languagemodel #transformer This video contains a paper explanation and an incredibly informative interview with first author Armen Aghajanyan. Autoregressive Transformers have come to dominate many fields in Machine Learning, from text generation to image creation and many more. However, there are two problems. First, the collected data is usually scraped from the web and uni- or bi-modal and throws away a lot of structure of the original websites, and second, language modelling losses are uni-directional. CM3 addresses both problems: It directly operates on HTML and includes text, hyperlinks, and even images (via VQGAN tokenization) and can therefore be used in plenty of ways: Text generation, captioning, image creation, entity linking, and much more. It also introduces a new training strategy called Causally Masked Language Modelling, which brings a level of bi-directionality into autoregressive language modelling. In the interview after the paper explanation, Armen and I go deep into the how and why of these giant models, we go over the stunning results and we make sense of what they mean for the future of universal models. OUTLINE: 0:00 - Intro & Overview 6:30 - Directly learning the structure of HTML 12:30 - Causally Masked Language Modelling 18:50 - A short look at how to use this model 23:20 - Start of interview 25:30 - Feeding language models with HTML 29:45 - How to get bi-directionality into decoder-only Transformers? 37:00 - Images are just tokens 41:15 - How does one train such giant models? 45:40 - CM3 results are amazing 58:20 - Large-scale dataset collection and content filtering 1:04:40 - More experimental results 1:12:15 - Why don't we use raw HTML? 1:18:20 - Does this paper contain too many things? Paper: https://arxiv.org/abs/2201.07520 Abstract: We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens. Our new causally masked approach generates tokens left to right while also masking out a small number of long token spans that are generated at the end of the string, instead of their original positions. The casual masking object provides a type of hybrid of the more common causal and masked language models, by enabling full generative modeling while also providing bidirectional context when generating the masked spans. We train causally masked language-image models on large-scale web and Wikipedia articles, where each document contains all of the text, hypertext markup, hyperlinks, and image tokens (from a VQVAE-GAN), provided in the order they appear in the original HTML source (before masking). The resulting CM3 models can generate rich structured, multi-modal outputs while conditioning on arbitrary masked document contexts, and thereby implicitly learn a wide range of text, image, and cross modal tasks. They can be prompted to recover, in a zero-shot fashion, the functionality of models such as DALL-E, GENRE, and HTLM. We set the new state-of-the-art in zero-shot summarization, entity linking, and entity disambiguation while maintaining competitive performance in the fine-tuning setting. We can generate images unconditionally, conditioned on text (like DALL-E) and do captioning all in a zero-shot setting with a single model. Authors: Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer

Feb 17, 2022 • 55min

AI against Censorship: Genetic Algorithms, The Geneva Project, ML in Security, and more!

#security #censorship #ai Most of us conceive the internet as a free and open space where we are able to send traffic between any two nodes, but for large parts of the world this is not the case. Entire nations have large machinery in place to survey all internet traffic and automated procedures to block any undesirable connections. Evading such censorship has been largely a cat-and-mouse game between security researchers and government actors. A new system, called Geneva, uses a Genetic Algorithm in combination with Evolutionary Search in order to dynamically evade such censorship and adjust itself in real-time to any potential response by its adversaries. In this video, I talk to Security researcher Kevin Bock, who is one of Geneva's main contributors and member of the Breakerspace project. We talk about the evolution of internet censorship, how to evade it, how to mess with the censors' infrastructure, as well as the broader emerging connections between AI and Security. OUTLINE: 0:00 - Intro 3:30 - What is automated censorship in networks? 7:20 - The evolution of censorship vs evasion 12:40 - Why do we need a dynamic, evolving system? 16:30 - The building blocks of Geneva 23:15 - Introducing evolution 28:30 - What's the censors' response? 31:45 - How was Geneva's media reception? 33:15 - Where do we go from here? 37:30 - Can we deliberately attack the censors? 47:00 - On responsible disclosure 49:40 - Breakerspace: Security research for undergrads 50:40 - How often do you get into trouble? 52:10 - How can I get started in security? Learn more at: - Geneva (& more) project page: https://censorship.ai - Open Observatory of Network Interference: https://ooni.org - Censored Planet: https://censoredplanet.org - Breakerspace: https://breakerspace.cs.umd.edu Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Feb 16, 2022 • 1h 18min

HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning (w/ Author)

#hypertransformer #metalearning #deeplearning This video contains a paper explanation and an interview with author Andrey Zhmoginov! Few-shot learning is an interesting sub-field in meta-learning, with wide applications, such as creating personalized models based on just a handful of data points. Traditionally, approaches have followed the BERT approach where a large model is pre-trained and then fine-tuned. However, this couples the size of the final model to the size of the model that has been pre-trained. Similar problems exist with "true" meta-learners, such as MaML. HyperTransformer fundamentally decouples the meta-learner from the size of the final model by directly predicting the weights of the final model. The HyperTransformer takes the few-shot dataset as a whole into its context and predicts either one or multiple layers of a (small) ConvNet, meaning its output are the weights of the convolution filters. Interestingly, and with the correct engineering care, this actually appears to deliver promising results and can be extended in many ways. OUTLINE: 0:00 - Intro & Overview 3:05 - Weight-generation vs Fine-tuning for few-shot learning 10:10 - HyperTransformer model architecture overview 22:30 - Why the self-attention mechanism is useful here 34:45 - Start of Interview 39:45 - Can neural networks even produce weights of other networks? 47:00 - How complex does the computational graph get? 49:45 - Why are transformers particularly good here? 58:30 - What can the attention maps tell us about the algorithm? 1:07:00 - How could we produce larger weights? 1:09:30 - Diving into experimental results 1:14:30 - What questions remain open? Paper: https://arxiv.org/abs/2201.04182 ERRATA: I introduce Max Vladymyrov as Mark Vladymyrov Abstract: In this work we propose a HyperTransformer, a transformer-based model for few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity transformer model, we effectively decouple the complexity of the large task space from the complexity of individual tasks. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters. For larger models we discover that generating the last layer alone allows us to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable. Finally, we extend our approach to a semi-supervised regime utilizing unlabeled samples in the support set and further improving few-shot performance. Authors: Andrey Zhmoginov, Mark Sandler, Max Vladymyrov Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

Feb 16, 2022 • 27min

[ML News] DeepMind AlphaCode | OpenAI math prover | Meta battles harmful content with AI

#mlnews #alphacode #openai The latest and greatest from the world of Machine Learning! Merch: store.ykilcher.com Sponsor: Weights & Biases https://wandb.me/yannic OUTLINE: 0:00 - Intro 0:15 - Sponsor: Weights & Biases 3:15 - DeepMind's AlphaCode: AI competitive programmer 11:30 - OpenAI uses language models to prove math theorems 14:30 - StyleGAN XL: Scaling StyleGAN to diverse datasets 16:10 - ar5iv.org displays papers as HTML5 17:40 - Helpful Things 19:30 - ICML22 Review process changes 21:15 - Meta AI tackles harmful content classification using few-shot learning 23:55 - Company claims to produce face images from DNA References: https://deepmind.com/blog/article/Com... https://alphacode.deepmind.com/#layer... https://storage.googleapis.com/deepmi... https://twitter.com/DBahdanau/status/... https://openai.com/blog/formal-math/ https://arxiv.org/pdf/2202.01344.pdf https://blog.eleuther.ai/announcing-2... https://sites.google.com/view/stylega... https://arxiv.org/pdf/2202.00273.pdf https://ar5iv.org/ https://ar5iv.org/html/1910.06709 https://twitter.com/YiTayML/status/14... https://ffcv.io/ https://github.com/ott-jax/ott https://twitter.com/soumithchintala/s... https://github.com/facebookresearch/d... https://www.reddit.com/r/MachineLearn... https://icml.cc/Conferences/2022/Revi... https://icml.cc/Conferences/2022/Call... https://ai.facebook.com/blog/harmful-... https://www.technologyreview.com/2022... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Feb 16, 2022 • 1h 17min

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents (+Author)

#gpt3 #embodied #planning In this video: Paper explanation, followed by first author interview with Wenlong Huang. Large language models contain extraordinary amounts of world knowledge that can be queried in various ways. But their output format is largely uncontrollable. This paper investigates the VirtualHome environment, which expects a particular set of actions, objects, and verbs to be used. Turns out, with proper techniques and only using pre-trained models (no fine-tuning), one can translate unstructured language model outputs into the structured grammar of the environment. This is potentially very useful anywhere where the models' world knowledge needs to be provided in a particular structured format. OUTLINE: 0:00 - Intro & Overview 2:45 - The VirtualHome environment 6:25 - The problem of plan evaluation 8:40 - Contributions of this paper 16:40 - Start of interview 24:00 - How to use language models with environments? 34:00 - What does model size matter? 40:00 - How to fix the large models' outputs? 55:00 - Possible improvements to the translation procedure 59:00 - Why does Codex perform so well? 1:02:15 - Diving into experimental results 1:14:15 - Future outlook Paper: https://arxiv.org/abs/2201.07207 Website: https://wenlong.page/language-planner/ Code: https://github.com/huangwl18/language... Wenlong's Twitter: https://twitter.com/wenlong_huang Abstract: Can world knowledge learned by large language models (LLMs) be used to act in interactive environments? In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. "make breakfast"), to a chosen set of actionable steps (e.g. "open fridge"). While prior work focused on learning from explicit step-by-step examples of how to act, we surprisingly find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans without any further training. However, the plans produced naively by LLMs often cannot map precisely to admissible actions. We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions. Our evaluation in the recent VirtualHome environment shows that the resulting method substantially improves executability over the LLM baseline. The conducted human evaluation reveals a trade-off between executability and correctness but shows a promising sign towards extracting actionable knowledge from language models. Website at this https URL Authors: Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch Links: Merch: store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Feb 16, 2022 • 16min

OpenAI Embeddings (and Controversy?!)

#mlnews #openai #embeddings COMMENTS DIRECTLY FROM THE AUTHOR (thanks a lot for reaching out Arvind :) ): 1. The FIQA results you share also have code to reproduce the results in the paper using the API: https://twitter.com/arvind_io/status/... There's no discrepancy AFAIK. 2. We leave out 6 not 7 BEIR datasets. Results on msmarco, nq and triviaqa are in a separate table (Table 5 in the paper). NQ is part of BEIR too and we didn't want to repeat it. Finally, the 6 datasets we leave out are not readily available and it is common to leave them out in prior work too. For examples, see SPLADE v2 (https://arxiv.org/pdf/2109.10086.pdf) also evaluates on the same 12 BEIR datasets. 3. Finally, I'm now working on time travel so that I can cite papers from the future :) END COMMENTS FROM THE AUTHOR OpenAI launches an embeddings endpoint in their API, providing high-dimensional vector embeddings for use in text similarity, text search, and code search. While embeddings are universally recognized as a standard tool to process natural language, people have raised doubts about the quality of OpenAI's embeddings, as one blog post found they are often outperformed by open-source models, which are much smaller and with which embedding would cost a fraction of what OpenAI charges. In this video, we examine the claims made and determine what it all means. OUTLINE: 0:00 - Intro 0:30 - Sponsor: Weights & Biases 2:20 - What embeddings are available? 3:55 - OpenAI shows promising results 5:25 - How good are the results really? 6:55 - Criticism: Open models might be cheaper and smaller 10:05 - Discrepancies in the results 11:00 - The author's response 11:50 - Putting things into perspective 13:35 - What about real world data? 14:40 - OpenAI's pricing strategy: Why so expensive? Sponsor: Weights & Biases https://wandb.me/yannic Merch: store.ykilcher.com ERRATA: At 13:20 I say "better", it should be "worse" References: https://openai.com/blog/introducing-t... https://arxiv.org/pdf/2201.10005.pdf https://beta.openai.com/docs/guides/e... https://beta.openai.com/docs/api-refe... https://twitter.com/Nils_Reimers/stat... https://medium.com/@nils_reimers/open... https://mobile.twitter.com/arvind_io/... https://twitter.com/gwern/status/1487... https://twitter.com/gwern/status/1487... https://twitter.com/Nils_Reimers/stat... https://twitter.com/gwern/status/1470... https://www.reddit.com/r/MachineLearn... https://mobile.twitter.com/arvind_io/... https://mobile.twitter.com/arvind_io/... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

Feb 16, 2022 • 1h 21min

Unsupervised Brain Models - How does Deep Learning inform Neuroscience? (w/ Patrick Mineault)

#deeplearning #brain #neuroscience Originally, Deep Learning sprang into existence inspired by how the brain processes information, but the two fields have diverged ever since. However, given that deep models can solve many perception tasks with remarkable accuracy, is it possible that we might be able to learn something about how the brain works by inspecting our models? I speak to Patrick Mineault about his blog post "2021 in review: unsupervised brain models" and we explore why neuroscientists are taking interest in unsupervised and self-supervised deep neural networks in order to explain how the brain works. We discuss a series of influential papers that have appeared last year, and we go into the more general questions of connecting neuroscience and machine learning. OUTLINE: 0:00 - Intro & Overview 6:35 - Start of Interview 10:30 - Visual processing in the brain 12:50 - How does deep learning inform neuroscience? 21:15 - Unsupervised training explains the ventral stream 30:50 - Predicting own motion parameters explains the dorsal stream 42:20 - Why are there two different visual streams? 49:45 - Concept cells and representation learning 56:20 - Challenging the manifold theory 1:08:30 - What are current questions in the field? 1:13:40 - Should the brain inform deep learning? 1:18:50 - Neuromatch Academy and other endeavours Blog Post: https://xcorr.net/2021/12/31/2021-in-... Patrick's Blog: https://xcorr.net/ Twitter: https://twitter.com/patrickmineault Neuromatch Academy: https://academy.neuromatch.io/ Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

Feb 16, 2022 • 20min

GPT-NeoX-20B - Open-Source huge language model by EleutherAI (Interview w/ co-founder Connor Leahy)

#eleuther #gptneo #gptj EleutherAI announces GPT-NeoX-20B, a 20 billion parameter open-source language model, inspired by GPT-3. Connor joins me to discuss the process of training, how the group got their hands on the necessary hardware, what the new model can do, and how anyone can try it out! OUTLINE: 0:00 - Intro 1:00 - Start of interview 2:00 - How did you get all the hardware? 3:50 - What's the scale of this model? 6:00 - A look into the experimental results 11:15 - Why are there GPT-Neo, GPT-J, and GPT-NeoX? 14:15 - How difficult is training these big models? 17:00 - Try out the model on GooseAI 19:00 - Final thoughts Read the announcement: https://blog.eleuther.ai/announcing-20b/ Try out the model: https://goose.ai/ Check out EleutherAI: https://www.eleuther.ai/ Read the code: https://github.com/EleutherAI/gpt-neox Hardware sponsor: https://www.coreweave.com/ Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Feb 2, 2022 • 1h 11min

Predicting the rules behind - Deep Symbolic Regression for Recurrent Sequences (w/ author interview)

#deeplearning #symbolic #research This video includes an interview with first author Stéphane d'Ascoli (https://sdascoli.github.io/). Deep neural networks are typically excellent at numeric regression, but using them for symbolic computation has largely been ignored so far. This paper uses transformers to do symbolic regression on integer and floating point number sequences, which means that given the start of a sequence of numbers, the model has to not only predict the correct continuation, but also predict the data generating formula behind the sequence. Through clever encoding of the input space and a well constructed training data generation process, this paper's model can learn and represent many of the sequences in the OEIS, the online encyclopedia of integer sequences and it also features an interactive demo if you want to try it by yourself. OUTLINE: 0:00 - Introduction 2:20 - Summary of the Paper 16:10 - Start of Interview 17:15 - Why this research direction? 20:45 - Overview of the method 30:10 - Embedding space of input tokens 33:00 - Data generation process 42:40 - Why are transformers useful here? 46:40 - Beyond number sequences, where is this useful? 48:45 - Success cases and failure cases 58:10 - Experimental Results 1:06:30 - How did you overcome difficulties? 1:09:25 - Interactive demo Paper: https://arxiv.org/abs/2201.04600 Interactive demo: http://recur-env.eba-rm3fchmn.us-east... Abstract: Symbolic regression, i.e. predicting a function from the observation of its values, is well-known to be a challenging task. In this paper, we train Transformers to infer the function or recurrence relation underlying sequences of integers or floats, a typical task in human IQ tests which has hardly been tackled in the machine learning literature. We evaluate our integer model on a subset of OEIS sequences, and show that it outperforms built-in Mathematica functions for recurrence prediction. We also demonstrate that our float model is able to yield informative approximations of out-of-vocabulary functions and constants, e.g. bessel0(x)≈sin(x)+cos(x)πx√ and 1.644934≈π2/6. An interactive demonstration of our models is provided at this https URL. Authors: Stéphane d'Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, François Charton Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app