Yannic Kilcher Videos (Audio Only) cover image

Yannic Kilcher Videos (Audio Only)

Latest episodes

undefined
Aug 28, 2023 • 41min

LLaMA: Open and Efficient Foundation Language Models (Paper Explained)

#ai #meta #languagemodel LLaMA is a series of large language models from 7B to 65B parameters, trained by Meta AI. They train for longer on more data and show that something like gpt-3 can be outperformed by significantly smaller models when trained like this. Meta also releases the trained models to the research community. OUTLINE: 0:00 - Introduction & Paper Overview 4:30 - Rant on Open-Sourcing 8:05 - Training Data 12:40 - Training Hyperparameters 14:50 - Architecture Modifications 17:10 - Optimizer 19:40 - Efficient Implementation 26:15 - Main Results 38:00 - Some more completions 40:00 - Conclusion Paper: https://arxiv.org/abs/2302.13971 Website: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/ Abstract: We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community. Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Aug 28, 2023 • 1h 21min

Open Assistant Inference Backend Development (Hands-On Coding)

#ai #huggingface #coding Join me as I build streaming inference into the Hugging Face text generation server, going through cuda, python, rust, grpc, websockets, server-sent events, and more... Original repo is here: https://github.com/huggingface/text-generation-inference OpenAssistant repo is here: https://github.com/LAION-AI/Open-Assistant (see inference/) Check out https://www.wandb.courses/ for free MLOps courses! Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Aug 28, 2023 • 36min

OpenAssistant - ChatGPT's Open Alternative (We need your help!)

#openassistant #chatgpt #ai Help us collect data for OpenAssistant, the largest and most open alternative to ChatGPT. https://open-assistant.io OUTLINE: 0:00 - Intro 0:30 - The Project 2:05 - Getting to Minimum Viable Prototype 5:30 - First Tasks 10:00 - Leaderboard 11:45 - Playing the Assistant 14:40 - Tricky Facts 16:25 - What if humans had wings? 17:05 - Can foxes be tamed? 23:45 - Can zebras be tamed? 26:15 - Yo (spam) 27:00 - More tasks 29:10 - Entitled Emails 34:35 - Final Words Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Jan 2, 2023 • 32min

ChatGPT: This AI has a JAILBREAK?! (Unbelievable AI Progress)

#chatgpt #ai #openai  ChatGPT, OpenAI's newest model is a GPT-3 variant that has been fine-tuned using Reinforcement Learning from Human Feedback, and it is taking the world by storm! Sponsor: Weights & Biases https://wandb.me/yannic OUTLINE: 0:00 - Intro 0:40 - Sponsor: Weights & Biases 3:20 - ChatGPT: How does it work? 5:20 - Reinforcement Learning from Human Feedback 7:10 - ChatGPT Origins: The GPT-3.5 Series 8:20 - OpenAI's strategy: Iterative Refinement 9:10 - ChatGPT's amazing capabilities 14:10 - Internals: What we know so far 16:10 - Building a virtual machine in ChatGPT's imagination (insane) 20:15 - Jailbreaks: Circumventing the safety mechanisms 29:25 - How OpenAI sees the future References: https://openai.com/blog/chatgpt/ https://openai.com/blog/language-model-safety-and-misuse/ https://beta.openai.com/docs/model-index-for-researchers https://scale.com/blog/gpt-3-davinci-003-comparison#Conclusion https://twitter.com/johnvmcdonnell/status/1598470129121374209 https://twitter.com/blennon_/status/1597374826305318912 https://twitter.com/TimKietzmann/status/1598230759118376960/photo/1 https://twitter.com/_lewtun/status/1598056075672027137/photo/2 https://twitter.com/raphaelmilliere/status/1598469100535259136 https://twitter.com/CynthiaSavard/status/1598498138658070530/photo/1 https://twitter.com/tylerangert/status/1598389755997290507/photo/1 https://twitter.com/amasad/status/1598042665375105024/photo/1 https://twitter.com/goodside/status/1598129631609380864/photo/1 https://twitter.com/moyix/status/1598081204846489600/photo/2 https://twitter.com/JusticeRage/status/1598959136531546112 https://twitter.com/yoavgo/status/1598594145605636097 https://twitter.com/EladRichardson/status/1598333315764871174 https://twitter.com/charles_irl/status/1598319027327307785/photo/4 https://twitter.com/jasondebolt/status/1598243854343606273 https://twitter.com/mattshumer_/status/1598185710166896641/photo/1 https://twitter.com/i/web/status/1598246145171804161 https://twitter.com/bleedingedgeai/status/1598378564373471232 https://twitter.com/MasterScrat/status/1598830356115124224 https://twitter.com/Sentdex/status/1598803009844256769 https://twitter.com/harrison_ritz/status/1598828017446371329 https://twitter.com/parafactual/status/1598212029479026689 https://www.engraved.blog/building-a-virtual-machine-inside/ https://twitter.com/317070 https://twitter.com/zehavoc/status/1599193444043268096 https://twitter.com/yoavgo/status/1598360581496459265 https://twitter.com/yoavgo/status/1599037412411596800 https://twitter.com/yoavgo/status/1599045344863879168 https://twitter.com/natfriedman/status/1598477452661383168 https://twitter.com/conradev/status/1598487973351362561/photo/1 https://twitter.com/zswitten/status/1598100186605441024 https://twitter.com/CatEmbedded/status/1599141379879600128/photo/2 https://twitter.com/mattshumer_/status/1599175127148949505 https://twitter.com/vaibhavk97/status/1598930958769860608/photo/1 https://twitter.com/dan_abramov/status/1598800508160024588/photo/1 https://twitter.com/MinqiJiang/status/1598832656422432768/photo/2 https://twitter.com/zswitten/status/1598088280066920453 https://twitter.com/m1guelpf/status/1598203861294252033/photo/1 https://twitter.com/SilasAlberti/status/1598257908567117825/photo/1 https://twitter.com/gf_256/status/1598962842861899776/photo/1 https://twitter.com/zswitten/status/1598088267789787136 https://twitter.com/gf_256/status/1598178469955112961/photo/1
undefined
Nov 30, 2022 • 42min

[ML News] GPT-4 Rumors | AI Mind Reading | Neuron Interaction Solved | AI Theorem Proving

#ai #mlnews #gpt4 Your weekly news from the AI & Machine Learning world. OUTLINE: 0:00 - Introduction 0:25 - AI reads brain signals to predict what you're thinking 3:00 - Closed-form solution for neuron interactions 4:15 - GPT-4 rumors 6:50 - Cerebras supercomputer 7:45 - Meta releases metagenomics atlas 9:15 - AI advances in theorem proving 10:40 - Better diffusion models with expert denoisers 12:00 - BLOOMZ & mT0 13:05 - ICLR reviewers going mad 21:40 - Scaling Transformer inference 22:10 - Infinite nature flythrough generation 23:55 - Blazing fast denoising 24:45 - Large-scale AI training with MultiRay 25:30 - arXiv to include Hugging Face spaces 26:10 - Multilingual Diffusion 26:30 - Music source separation 26:50 - Multilingual CLIP 27:20 - Drug response prediction 27:50 - Helpful Things ERRATA: HF did not acquire spaces, they launched spaces themselves and supported Gradio from the start. They later acquired Gradio. References: AI reads brain signals to predict what you're thinking https://mind-vis.github.io/?s=09&utm_source=pocket_saves https://neurosciencenews.com/bmi-internal-speech-21837/ Closed-form solution for neuron interactions https://twitter.com/ramin_m_h/status/1592585672606769153/photo/1 https://github.com/raminmh/CfC https://github.com/raminmh/CfC/blob/main/torch_cfc.py GPT-4 rumors https://thealgorithmicbridge.substack.com/p/gpt-4-rumors-from-silicon-valley?utm_source=pocket_reader Cerebras supercomputer https://www.cerebras.net/andromeda/ Meta releases metagenomics atlas https://ai.facebook.com/blog/protein-folding-esmfold-metagenomics/ https://www.genome.gov/genetics-glossary/Metagenomics AI advances in theorem proving https://ai.facebook.com/blog/ai-math-theorem-proving/ https://marketplace.visualstudio.com/items?itemName=jroesch.lean Better diffusion models with expert denoisers https://deepimagination.cc/eDiffi/ BLOOMZ & mT0 https://arxiv.org/abs/2211.01786?utm_source=pocket_reader https://huggingface.co/bigscience/bloomz?text=Suggest+at+least+five+related+search+terms+to+%22M%E1%BA%A1ng+neural+nh%C3%A2n+t%E1%BA%A1o%22. ICLR reviewers going mad https://twitter.com/XiangruTang/status/1589703605098975237?utm_source=pocket_reader https://twitter.com/BlancheMinerva/status/1588164585961422849?utm_source=pocket_reader https://openreview.net/forum?id=pfuqQQCB34 https://twitter.com/peter_richtarik/status/1591408710366408706?utm_source=pocket_reader Scaling Transformer inference https://arxiv.org/abs/2211.05102 Infinite nature flythrough generation https://ai.googleblog.com/2022/11/infinite-nature-generating-3d.html?utm_source=pocket_reader Blazing fast denoising https://github.com/dome272/Paella https://arxiv.org/abs/2211.07292 Large-scale AI training with MultiRay https://ai.facebook.com/blog/multiray-large-scale-AI-models/ arXiv to include Hugging Face spaces https://blog.arxiv.org/2022/11/17/discover-state-of-the-art-machine-learning-demos-on-arxiv/ Multilingual Diffusion https://github.com/FlagAI-Open/FlagAI/tree/master/examples/AltDiffusion Music source separation https://github.com/facebookresearch/demucs https://arxiv.org/abs/2211.08553
undefined
5 snips
Nov 30, 2022 • 1h 1min

CICERO: An AI agent that negotiates, persuades, and cooperates with people

#ai #cicero #diplomacy  A team from Meta AI has developed Cicero, an agent that can play the game Diplomacy, in which players have to communicate via chat messages to coordinate and plan into the future. Paper Title: Human-level play in the game of Diplomacy by combining language models with strategic reasoning Commented game by human expert: https://www.youtube.com/watch?v=u5192bvUS7k OUTLINE: 0:00 - Introduction 9:50 - AI in cooperation games 13:50 - Cicero agent overview 25:00 - A controllable dialogue model 36:50 - Dialogue-conditional strategic planning 49:00 - Message filtering 53:45 - Cicero's play against humans 55:15 - More examples & discussion Homepage: https://ai.facebook.com/research/cicero/ Code: https://github.com/facebookresearch/diplomacy_cicero Blog: https://ai.facebook.com/blog/cicero-ai-negotiates-persuades-and-cooperates-with-people/ Paper: https://www.science.org/doi/10.1126/science.ade9097 Abstract: Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge. We introduce Cicero, the first AI agent to achieve human-level performance in Diplomacy, a strategy game involving both cooperation and competition that emphasizes natural language negotiation and tactical coordination between seven players. Cicero integrates a language model with planning and reinforcement learning algorithms by inferring players' beliefs and intentions from its conversations and generating dialogue in pursuit of its plans. Across 40 games of an anonymous online Diplomacy league, Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game. Authors: Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyuan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sasha Mitts, Adithya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, Markus Zijlstra Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Nov 23, 2022 • 23min

[ML News] Multiplayer Stable Diffusion | OpenAI needs more funding | Text-to-Video models incoming

#mlnews #ai #mlinpl Your news from the world of Machine Learning! OUTLINE: 0:00 - Introduction 1:25 - Stable Diffusion Multiplayer 2:15 - Huggingface: DOI for Models & Datasets 3:10 - OpenAI asks for more funding 4:25 - The Stack: Source Code Dataset 6:30 - Google Vizier Open-Sourced 7:10 - New Models 11:50 - Helpful Things 20:30 - Prompt Databases 22:15 - Lexicap by Karpathy References: Stable Diffusion Multiplayer https://huggingface.co/spaces/huggingface-projects/stable-diffusion-multiplayer?roomid=room-0 Huggingface: DOI for Models & Datasets https://huggingface.co/blog/introducing-doi OpenAI asks for more funding https://www.theinformation.com/articles/openai-valued-at-nearly-20-billion-in-advanced-talks-with-microsoft-for-more-funding https://www.wsj.com/articles/microsoft-in-advanced-talks-to-increase-investment-in-openai-11666299548 The Stack: Source Code Dataset https://huggingface.co/datasets/bigcode/the-stack?utm_source=pocket_mylist Google Vizier Open-Sourced https://github.com/google/vizier New Models https://imagen.research.google/video/ https://phenaki.github.io/ https://makeavideo.studio/?utm_source=pocket_mylist https://dreamfusion3d.github.io/ https://arxiv.org/pdf/2210.15257.pdf https://huggingface.co/spaces/PaddlePaddle/ERNIE-ViLG https://github.com/PaddlePaddle/PaddleHub Helpful Things https://thecharlieblake.co.uk/visualising-ml-number-formats https://griddly.ai/ https://engineering.fb.com/2022/10/18/open-source/ocp-summit-2022-grand-teton/?utm_source=twitter&utm_medium=organic_social&utm_campaign=eng2022h2 https://twitter.com/psuraj28/status/1580640841583902720?utm_source=pocket_mylist https://huggingface.co/blog/stable_diffusion_jax https://github.com/Lightning-AI/stable-diffusion-deploy https://lightning.ai/docs/stable/ https://github.com/CarperAI/trlx https://github.com/DLR-RM/rl-baselines3-zoo https://github.com/Sea-Snell/JAXSeq https://www.reddit.com/r/MachineLearning/comments/xoitw9/p_albumentations_13_is_released_a_python_library/?utm_source=pocket_mylist https://twitter.com/Warvito/status/1570691960792580096?utm_source=pocket_mylist https://arxiv.org/abs/2209.07162 https://academictorrents.com/details/63aeb864bbe2115ded0aa0d7d36334c026f0660b https://huggingface.co/spaces/THUDM/CodeGeeX https://ai.facebook.com/blog/gpu-inference-engine-nvidia-amd-open-source/?utm_source=twitter&utm_medium=organic_social&utm_campaign=blog https://github.com/nerfstudio-project/nerfstudio https://www.nerfacc.com/en/latest/ https://github.com/dstackai/dstack https://www.reddit.com/r/MachineLearning/comments/yeyxlo/p_openai_whisper_3x_cpu_inference_speedup/?utm_source=pocket_mylist https://github.com/MiscellaneousStuff/openai-whisper-cpu/issues/1 Prompt Databases https://huggingface.co/datasets/poloclub/diffusiondb https://publicprompts.art/ https://visualise.ai/ https://twitter.com/SamuelAlbanie/status/1574111928431026179/photo/1 Lexicap by Karpathy https://karpathy.ai/lexicap/0139-large.html Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :)
undefined
Nov 23, 2022 • 28min

The New AI Model Licenses have a Legal Loophole (OpenRAIL-M of BLOOM, Stable Diffusion, etc.)

#ai #stablediffusion #license  So-called responsible AI licenses are stupid, counterproductive, and have a dangerous legal loophole in them. OpenRAIL++ License here: https://www.ykilcher.com/license OUTLINE: 0:00 - Introduction 0:40 - Responsible AI Licenses (RAIL) of BLOOM and Stable Diffusion 3:35 - Open source software's dilemma of bad usage and restrictions 8:45 - Good applications, bad applications 12:45 - A dangerous legal loophole 15:50 - OpenRAIL++ License 16:50 - This has nothing to do with copyright 26:00 - Final thoughts References: https://huggingface.co/CompVis/stable-diffusion/tree/main https://huggingface.co/spaces/CompVis/stable-diffusion-license https://huggingface.co/bigscience/bloom?text=34%2B10%3D44+%0A54%2B20%3D https://huggingface.co/spaces/bigscience/license https://huggingface.co/runwayml/stable-diffusion-v1-5 https://huggingface.co/spaces/CompVis/stable-diffusion-license/raw/main/license.txt https://www.gnu.org/philosophy/programs-must-not-limit-freedom-to-run.en.html https://www.gnu.org/philosophy/free-sw.html#four-freedoms https://www.licenses.ai/blog/2022/8/26/bigscience-open-rail-m-license https://bigscience.huggingface.co/blog/bigscience-ethical-charter https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses https://en.wikipedia.org/wiki/Copyright#Eligible_works https://en.wikipedia.org/wiki/Creative_work https://www.pearlcohen.com/copyright-office-reiterates-that-works-created-by-ai-cannot-be-copyrighted/ https://jipel.law.nyu.edu/vol-8-no-2-1-hedrick/#II https://www.ykilcher.com/license Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
Nov 23, 2022 • 1h 5min

ROME: Locating and Editing Factual Associations in GPT (Paper Explained & Author Interview)

#ai #language #knowledge  Large Language Models have the ability to store vast amounts of facts about the world. But little is known, how these models actually do this. This paper aims at discovering the mechanism and location of storage and recall of factual associations in GPT models, and then proposes a mechanism for the targeted editing of such facts, in form of a simple rank-one update to a single MLP layer. This has wide implications both for how we understand such models' inner workings, and for our ability to gain greater control over such models in the future. OUTLINE: 0:00 - Introduction 1:40 - What are the main questions in this subfield? 6:55 - How causal tracing reveals where facts are stored 18:40 - Clever experiments show the importance of MLPs 24:30 - How do MLPs store information? 29:10 - How to edit language model knowledge with precision? 36:45 - What does it mean to know something? 39:00 - Experimental Evaluation & the CounterFact benchmark 45:40 - How to obtain the required latent representations? 51:15 - Where is the best location in the model to perform edits? 58:00 - What do these models understand about language? 1:02:00 - Questions for the community Paper: https://arxiv.org/abs/2202.05262 Follow-up paper on Mass-Editing Memory in a Transformer: https://arxiv.org/abs/2210.07229 Abstract: We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at this https URL Authors: Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
undefined
16 snips
Oct 23, 2022 • 32min

Neural Networks are Decision Trees (w/ Alexander Mattick)

#neuralnetworks #machinelearning #ai  Alexander Mattick joins me to discuss the paper "Neural Networks are Decision Trees", which has generated a lot of hype on social media. We ask the question: Has this paper solved one of the large mysteries of deep learning and opened the black-box neural networks up to interpretability? OUTLINE: 0:00 - Introduction 2:20 - Aren't Neural Networks non-linear? 5:20 - What does it all mean? 8:00 - How large do these trees get? 11:50 - Decision Trees vs Neural Networks 17:15 - Is this paper new? 22:20 - Experimental results 27:30 - Can Trees and Networks work together? Paper: https://arxiv.org/abs/2210.05189 Abstract: In this manuscript, we show that any feedforward neural network having piece-wise linear activation functions can be represented as a decision tree. The representation is equivalence and not an approximation, thus keeping the accuracy of the neural network exactly as is. We believe that this work paves the way to tackle the black-box nature of neural networks. We share equivalent trees of some neural networks and show that besides providing interpretability, tree representation can also achieve some computational advantages. The analysis holds both for fully connected and convolutional networks, which may or may not also include skip connections and/or normalizations. Author: Caglar Aytekin Links: Homepage: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord LinkedIn: https://www.linkedin.com/in/ykilcher If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app