The Inside View cover image

The Inside View

Latest episodes

undefined
Jul 6, 2023 • 43min

Jesse Hoogland on Developmental Interpretability and Singular Learning Theory

Jesse Hoogland is a research assistant at David Krueger's lab in Cambridge studying AI Safety. More recently, Jesse has been thinking about Singular Learning Theory and Developmental Interpretability, which we discuss in this episode. Before he came to grips with existential risk from AI, he co-founded a health-tech startup automating bariatric surgery patient journeys. (00:00) Intro (03:57) Jesse’s Story And Probability Of Doom (06:21) How Jesse Got Into Singular Learning Theory (08:50) Intuition behind SLT: the loss landscape (12:23) Does SLT actually predict anything? Phase Transitions (14:37) Why care about phase transition, grokking, etc (15:56) Detecting dangerous capabilities like deception in the (devel)opment (17:24) A concrete example: magnets (20:06) Why Jesse Is Bullish On Interpretability (23:57) Developmental Interpretability (28:06) What Happens Next? Jesse’s Vision (31:56) Toy Models of Superposition (32:47) Singular Learning Theory Part 2 (36:22) Are Current Models Creative? Reasoning? (38:19) Building Bridges Between Alignment And Other Disciplines (41:08) Where To Learn More About Singular Learning Theory Make sure I upload regularly: https://patreon.com/theinsideview Youtube: https://youtu.be/713KyknwShA Transcript: https://theinsideview.ai/jesse Jesse: https://twitter.com/jesse_hoogland Host: https://twitter.com/MichaelTrazzi Patreon supporters: - Vincent Weisser - Gunnar Höglund - Ryan Coppolo - Edward Huff - Emil Wallner - Jesse Hoogland - William Freire - Cameron Holmes - Jacques Thibodeau - Max Chiswick - Jack Seroy - JJ Hepburn
undefined
May 9, 2023 • 5min

Clarifying and predicting AGI by Richard Ngo

Explainer podcast for Richard Ngo's "Clarifying and predicting AGI" post on Lesswrong, which introduces the t-AGI framework to evaluate AI progress. A system is considered t-AGI if it can outperform most human experts, given time t, on most cognitive tasks. This is a new format, quite different from the interviews and podcasts I have been recording in the past. If you enjoyed this, let me know in the YouTube comments, or on twitter, @MichaelTrazzi. Youtube: https://youtu.be/JXYcLQItZsk Clarifying and predicting AGI: https://www.alignmentforum.org/posts/BoA3agdkAzL6HQtQP/clarifying-and-predicting-agi
undefined
May 6, 2023 • 1h 13min

Alan Chan And Max Kauffman on Model Evaluations, Coordination and AI Safety

Max Kaufmann⁠ and Alan Chan discuss the evaluation of large language models, AI Governance and more generally the impact of the deployment of foundational models. is currently a Research Assistant to Owain Evans, mainly thinking about (and fixing) issues that might arise as we scale up our current ML systems, but also interested in issues arising from multi-agent failures and situational awareness. Alan is PhD student at Mila advised by Nicolas Le Roux, with a strong interest in AI Safety, AI Governance and coordination. He has also recently been working with David Krueger and helped me with some of the interviews that have been published recently (ML Street talk and Christoph Schuhmann). Disclaimer: this discussion is much more casual than the rest of the conversations in this podcast. This was completely impromptu: I just thought it would be interesting to have Max and Alan discuss model evaluations (also called “evals” for short), since they are both interested in the topic. Transcript: https://heinsideview.ai/alan_and_max Youtube: https://youtu.be/BOLxeR_culU Outline (0:00:00) Introduction(0:01:16) LLMs Translating To Systems In The Future Is Confusing(0:03:23) Evaluations Should Measure Actions Instead of Asking Yes or No Questions(0:04:17) Identify Key Contexts for Dangerous Behavior to Write Concrete Evals(0:07:29) Implicit Optimization Process Affects Evals and Benchmarks(0:08:45) Passing Evals Doesn't Guarantee Safety(0:09:41) Balancing Technical Evals With Social Governance(0:11:00) Evaluations Must Be Convincing To Influence AI Development(0:12:04) Evals Might Convince The AI Safety Community But Not People in FAccT(0:13:21) Difficulty In Explaining AI Risk To Other Communities(0:14:19) Both Existential Safety And Fairness Are Important(0:15:14) Reasons Why People Don't Care About AI Existential Risk(0:16:10) The Association Between Sillicon Valley And People in FAccT(0:17:39) Timelines And RL Understanding Might Impact The Perception Existential Risk From AI(0:19:01) Agentic Models And Longtermism Hinder AI Safety Awareness(0:20:17) The Focus On Immediate AI Harms Might Be A Rejection Of Speculative Claims(0:21:50) Is AI Safety A Pascal Mugging(0:23:15) Believing In The Deployment Of Large Foundational Models Should Be Enough To Start Worrying(0:25:38) AI Capabilities Becomign More Evident to the Public Might Not Be Enough(0:27:27) Addressing Generalization and Reward Specification in AI(0:27:59) Evals as an Additional Layer of Security in AI Safety(0:28:41) A Portfolio Approach to AI Alignment and Safety(0:29:03) Imagine Alignment Is Solved In 2040, What Made It Happen?(0:33:04) AGI Timelines Are Uncertain And Anchored By Vibes(0:35:24) What Matters Is Agency, Strategical Awareness And Planning(0:37:15) Alignment Is A Public Good, Coordination Is Difficult(0:06:48) Dignity As AN Useful Heuristic In The Face Of Doom(0:42:28) What Will Society Look Like If We Actually Get Superintelligent Gods(0:45:41) Uncertainty About Societal Dynamics Affecting Long-Term Future With AGI(0:47:42) Biggest Frustration With The AI Safety Community(0:48:34) AI Safety Includes Addressing Negative Consequences of AI(0:50:41) Frustration: Lack of Bridge Building Between AI Safety and Fairness Communities(0:53:07) Building Bridges by Attending Conferences and Understanding Different Perspectives(0:56:02) AI Systems with Weird Instrumental Goals Pose Risks to Society(0:58:43) Advanced AI Systems Controlling Resources Could Magnify Suffering(1:00:24) Cooperation Is Crucial to Achieve Pareto Optimal Outcomes and Avoid Global Catastrophes(1:01:54) Alan's Origin Story(1:02:47) Alan's AI Safety Research Is Driven By Desire To Reduce Suffering And Improve Lives(1:04:52) Diverse Interests And Concern For Global Problems Led To AI Safety Research(1:08:46) The Realization Of The Potential Dangers Of AGI Motivated AI Safety Work(1:10:39) What is Alan Chan Working On At The Moment
undefined
May 4, 2023 • 1h 45min

Breandan Considine on Neuro Symbolic AI, Coding AIs and AI Timelines

Breandan Considine is a PhD student at the School of Computer Science at McGill University, under the supervision of Jin Guo and Xujie Si). There, he is building tools to help developers locate and reason about software artifacts, by learning to read and write code. I met Breandan while doing my "scale is all you need" series of interviews at Mila, where he surprised me by sitting down for two hours to discuss AGI timelines, augmenting developers with AI and neuro symbolic AI. A fun fact that many noticed while watching the "Scale Is All You Need change my mind" video is that he kept his biking hat most of the time during the interview, since he was close to leaving when we talked. All of the conversation below is real, but note that since I was not prepared to talk for so long, my camera ran out of battery and some of the video footage on Youtube is actually AI generated (Brendan consented to this). Disclaimer: when talking to people in this podcast I try to sometimes invite guests who share different inside views about existential risk from AI so that everyone in the AI community can talk to each other more and coordinate more effectively. Breandan is overall much more optimistic about the potential risks from AI than a lot of people working in AI Alignement research, but I think he is quite articulate in his position, even though I disagree with many of his assumptions. I believe his point of view is important to understand what software engineers and Symbolic reasoning researchers think of deep learning progress. Transcript: https://theinsideview.ai/breandan Youtube: ⁠https://youtu.be/Bo6jO7MIsIU⁠ Host: https://twitter.com/MichaelTrazzi Breandan: https://twitter.com/breandan OUTLINE (00:00) Introduction(01:16) Do We Need Symbolic Reasoning to Get To AGI?(05:41) Merging Symbolic Reasoning & Deep Learning for Powerful AI Systems(10:57) Blending Symbolic Reasoning & Machine Learning Elegantly(15:15) Enhancing Abstractions & Safety in Machine Learning(21:28) AlphaTensor's Applicability May Be Overstated(24:31) AI Safety, Alignment & Encoding Human Values in Code(29:56) Code Research: Moral, Information & Software Aspects(34:17) Automating Programming & Self-Improving AI(36:25) Debunking AI "Monsters" & World Domination Complexities(43:22) Neural Networks: Limits, Scaling Laws & Computation Challenges(59:54) Real-world Software Development vs. Competitive Programming(1:02:59) Measuring Programmer Productivity & Evaluating AI-generated Code(1:06:09) Unintended Consequences, Reward Misspecification & AI-Human Symbiosis(1:16:59) AI's Superior Intelligence: Impact, Self-Improvement & Turing Test Predictions(1:23:52) AI Scaling, Optimization Trade-offs & Economic Viability(1:29:02) Metrics, Misspecifications & AI's Rich Task Diversity(1:30:48) Federated Learning & AI Agent Speed Comparisons(1:32:56) AI Timelines, Regulation & Self-Regulating Systems
undefined
May 1, 2023 • 32min

Christoph Schuhmann on Open Source AI, Misuse and Existential risk

Christoph Schuhmann is the co-founder and organizational lead at LAION, the non-profit who released LAION-5B, a dataset of 5,85 billion CLIP-filtered image-text pairs, 14x bigger than LAION-400M, previously the biggest openly accessible image-text dataset in the world. Christoph is being interviewed by Alan Chan, PhD in Machine Learning at Mila, and friend of the podcast, in the context of the NeurIPS "existential risk from AI greater than 10% change my mind". youtube: https://youtu.be/-Mzfru1r_5s transcript: https://theinsideview.ai/christoph OUTLINE (00:00) Intro (01:13) How LAION Collected Billions Of Image-Text Pairs (05:08) On Misuse: "Most People Use Technology To Do Good Things" (09:32) Regulating Generative Models Won't Lead Anywhere (14:36) Instead of Slowing Down, Deploy Carefully, Always Double Check (18:23) The Solution To Societal Changes Is To Be Open And Flexible To Change (22:16) We Should Be Honest And Face The Tsunami (24:14) What Attitude Should We Have After Education Is Done (30:05) Existential Risk From AI
undefined
Apr 29, 2023 • 2h 4min

Simeon Campos on Short Timelines, AI Governance and AI Alignment Field Building

Siméon Campos is the founder of EffiSciences and SaferAI, mostly focusing on alignment field building and AI Governance. More recently, he started the newsletter Navigating AI Risk on AI Governance, with a first post on slowing down AI. Note: this episode was recorded in October 2022 so a lot of the content being discussed references what was known at the time, in particular when discussing GPT-3 (insteaed of GPT-3) or ACT-1 (instead of more recent things like AutoGPT). Transcript: https://theinsideview.ai/simeon Host: https://twitter.com/MichaelTrazzi Simeon: https://twitter.com/Simeon_Cps OUTLINE (00:00) Introduction(01:12) EffiSciences, SaferAI(02:31) Concrete AI Auditing Proposals(04:56) We Need 10K People Working On Alignment(11:08) What's AI Alignment(13:07) GPT-3 Is Already Decent At Reasoning(17:11) AI Regulation Is Easier In Short Timelines(24:33) Why Is Awareness About Alignment Not Widespread?(32:02) Coding AIs Enable Feedback Loops In AI Research(36:08) Technical Talent Is The Bottleneck In AI Research(37:58): 'Fast Takeoff' Is Asymptotic Improvement In AI Capabilities(43:52) Bear Market Can Somewhat Delay The Arrival Of AGI(45:55) AGI Need Not Require Much Intelligence To Do Damage(49:38) Putting Numbers On Confidence(54:36) RL On Top Of Coding AIs(58:21) Betting On Arrival Of AGI(01:01:47) Power-Seeking AIs Are The Objects Of Concern(01:06:43) Scenarios & Probability Of Longer Timelines(01:12:43) Coordination(01:22:49) Compute Governance Seems Relatively Feasible(01:32:32) The Recent Ban On Chips Export To China(01:38:20) AI Governance & Fieldbuilding Were Very Neglected(01:44:42) Students Are More Likely To Change Their Minds About Things(01:53:04) Bootcamps Are Better Medium Of Outreach(02:01:33) Concluding Thoughts
undefined
Jan 17, 2023 • 2h 35min

Collin Burns On Discovering Latent Knowledge In Language Models Without Supervision

Collin Burns is a second-year ML PhD at Berkeley, working with Jacob Steinhardt on making language models honest, interpretable, and aligned. In 2015 he broke the Rubik’s Cube world record, and he's now back with "Discovering latent knowledge in language models without supervision", a paper on how you can recover diverse knowledge represented in large language models without supervision.   Transcript: https://theinsideview.ai/collin  Paper: https://arxiv.org/abs/2212.03827 Lesswrong post: https://bit.ly/3kbyZML Host: https://twitter.com/MichaelTrazzi Collin: https://twitter.com/collinburns4 OUTLINE (00:22) Intro (01:33) Breaking The Rubik's Cube World Record (03:03) A Permutation That Happens Maybe 2% Of The Time (05:01) How Collin Became Convinced Of AI Alignment (07:55)  Was Minerva Just Low Hanging Fruits On MATH From Scaling? (12:47) IMO Gold Medal By 2026? How to update from AI Progress (17:03) Plausibly Automating AI Research In The Next Five Years (24:23) Making LLMs Say The Truth (28:11) Lying Is Already Incentivized As We Have Seend With Diplomacy (32:29) Mind Reading On 'Brain Scans' Through Logical Consistency (35:18) Misalignment, Or Why One Does Not Simply Prompt A Model Into Being Truthful (38:43) Classifying Hidden States, Maybe Using Truth Features Reepresented Linearly (44:48) Building A Dataset For Using Logical Consistency (50:16) Building A Confident And Consistent Classifier That Outputs Probabilities (53:25) Discovering Representations Of The Truth From Just Being Confident And Consistent (57:18) Making Models Truthful As A Sufficient Condition For Alignment (59:02) Classifcation From Hidden States Outperforms Zero-Shot Prompting Accuracy (01:02:27) Recovering Latent Knowledge From Hidden States Is Robust To Incorrect Answers In Few-Shot Prompts (01:09:04) Would A Superhuman GPT-N Predict Future News Articles (01:13:09) Asking Models To Optimize Money Without Breaking The Law (01:20:31) Training Competitive Models From Human Feedback That We Can Evaluate (01:27:26) Alignment Problems On Current Models Are Already Hard (01:29:19) We Should Have More People Working On New Agendas From First Principles (01:37:16) Towards Grounded Theoretical Work And Empirical Work Targeting Future Systems (01:41:52) There Is No True Unsupervised: Autoregressive Models Depend On What A Human Would Say (01:46:04) Simulating Aligned Systems And Recovering The Persona Of A Language Model (01:51:38) The Truth Is Somewhere Inside The Model, Differentiating Between Truth And Persona Bit by Bit Through Constraints (02:01:08) A Misaligned Model Would Have Activations Correlated With Lying (02:05:16) Exploiting Similar Structure To Logical Consistency With Unaligned Models (02:07:07) Aiming For Honesty, Not Truthfulness (02:11:15) Limitations Of Collin's Paper (02:14:12) The Paper Does Not Show The Complete Final Robust Method For This Problem (02:17:26) Humans Will Be 50/50 On Superhuman Questions (02:23:40) Asking Yourself "Why Am I Optimistic" and How Collin Approaches Research (02:29:16) Message To The ML and Cubing audience
undefined
Jan 12, 2023 • 1h 52min

Victoria Krakovna–AGI Ruin, Sharp Left Turn, Paradigms of AI Alignment

Victoria Krakovna, a Research Scientist at DeepMind and co-founder of the Future of Life Institute, dives into the critical realm of AGI safety. She discusses the dangers of unaligned AGI and the necessity of robust alignment strategies to prevent catastrophic outcomes. The conversation explores the 'sharp left turn' threat model, outlining how sudden advances in AI could undermine humanity's control. Krakovna emphasizes the importance of collaboration in AI research and the need for clear goal definitions to navigate the complex landscape of artificial intelligence.
undefined
Jan 7, 2023 • 2h 45min

David Krueger–Coordination, Alignment, Academia

David Krueger is an assistant professor at the University of Cambridge and got his PhD from Mila. His research group focuses on aligning deep learning systems, but he is also interested in governance and global coordination. He is famous in Cambridge for not having an AI alignment research agenda per se, and instead he tries to enable his seven PhD students to drive their own research. In this episode we discuss AI Takeoff scenarios, research going on at David's lab, Coordination, Governance, Causality, the public perception of AI Alignment research and how to change it. Youtube: https://youtu.be/bDMqo7BpNbk Transcript: https://theinsideview.ai/david OUTLINE (00:00) Highlights (01:06) Incentivized Behaviors and Takeoff Speeds (17:53) Building Models That Understand Causality (31:04) Agency, Acausal Trade And Causality in LLMs (40:44) Recursive Self Improvement, Bitter Lesson And Alignment (01:03:17) AI Governance And Coordination (01:13:26) David’s AI Alignment Research Lab and the Existential Safety Community (01:24:13) On The Public Perception of AI Alignment (01:35:58) How To Get People In Academia To Work on Alignment (02:00:19) Decomposing Learning Curves, Latest Research From David Krueger’s Lab (02:20:06) Safety-Performance Trade-Offs (02:30:20) Defining And Characterizing Reward Hacking (02:40:51) Playing Poker With Ethan Caballero, Timelines
undefined
Nov 3, 2022 • 24min

Ethan Caballero–Broken Neural Scaling Laws

Ethan Caballero is a PhD student at Mila interested in how to best scale Deep Learning models according to all downstream evaluations that matter. He is known as the fearless leader of the "Scale Is All You Need" movement and the edgiest person at MILA. His first interview is the second most popular interview on the channel and today he's back to talk about Broken Neural Scaling Laws and how to use them to superforecast AGI. Youtube: https://youtu.be/SV87S38M1J4 Transcript: https://theinsideview.ai/ethan2 OUTLINE (00:00) The Albert Einstein Of Scaling (00:50) The Fearless Leader Of The Scale Is All You Need Movement (01:07) A Functional Form Predicting Every Scaling Behavior (01:40) A Break Between Two Straight Lines On A Log Log Plot (02:32) The Broken Neural Scaling Laws Equation (04:04) Extrapolating A Ton Of Large Scale Vision And Language Tasks (04:49) Upstream And Downstream Have Different Breaks (05:22) Extrapolating Four Digit Addition Performance (06:11) On The Feasability Of Running Enough Training Runs (06:31) Predicting Sharp Left Turns (07:51) Modeling Double Descent (08:41) Forecasting Interpretability And Controllability (09:33) How Deception Might Happen In Practice (10:24) Sinister Stumbles And Treacherous Turns (11:18) Recursive Self Improvement Precedes Sinister Stumbles (11:51) Humans In The Loop For The Very First Deception (12:32) The Hardware Stuff Is Going To Come After The Software Stuff (12:57) Distributing Your Training By Copy-Pasting Yourself Into Different Servers (13:42) Automating The Entire Hardware Pipeline (14:47) Having Text AGI Spit Out New Robotics Design (16:33) The Case For Existential Risk From AI (18:32) Git Re-basin (18:54) Is Chain-Of-Thoughts Enough For Complex Reasoning In LMs? (19:52) Why Diffusion Models Outperform Other Generative Models (21:13) Using Whisper To Train GPT4 (22:33) Text To Video Was Only Slightly Impressive (23:29) Last Message

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode