undefined

Joe Carlsmith

Author of the LessWrong post "Incentive design and capability elicitation".

Top 5 podcasts with Joe Carlsmith

Ranked by the Snipd community
undefined
103 snips
Aug 22, 2024 • 2h 31min

Joe Carlsmith - Otherness and control in the age of AGI

Joe Carlsmith, an insightful thinker on technology and ethics, discusses trust in power and our relationship with artificial intelligence. He raises critical questions about aligning AI with human values and the risks of misalignment. The conversation explores how cultural influences shape identity in tech development and warns against authoritarian control reminiscent of historical despots. Carlsmith advocates for a pluralistic approach to AI governance to foster inclusivity and balance in a rapidly evolving future.
undefined
40 snips
May 19, 2023 • 3h 27min

#152 – Joe Carlsmith on navigating serious philosophical confusion

What is the nature of the universe? How do we make decisions correctly? What differentiates right actions from wrong ones?Such fundamental questions have been the subject of philosophical and theological debates for millennia. But, as we all know, and surveys of expert opinion make clear, we are very far from agreement. So... with these most basic questions unresolved, what’s a species to do?In today's episode, philosopher Joe Carlsmith — Senior Research Analyst at Open Philanthropy — makes the case that many current debates in philosophy ought to leave us confused and humbled. These are themes he discusses in his PhD thesis, A stranger priority? Topics at the outer reaches of effective altruism.Links to learn more, summary and full transcript.To help transmit the disorientation he thinks is appropriate, Joe presents three disconcerting theories — originating from him and his peers — that challenge humanity's self-assured understanding of the world.The first idea is that we might be living in a computer simulation, because, in the classic formulation, if most civilisations go on to run many computer simulations of their past history, then most beings who perceive themselves as living in such a history must themselves be in computer simulations. Joe prefers a somewhat different way of making the point, but, having looked into it, he hasn't identified any particular rebuttal to this 'simulation argument.'If true, it could revolutionise our comprehension of the universe and the way we ought to live...Other two ideas cut for length — click here to read the full post.These are just three particular instances of a much broader set of ideas that some have dubbed the "train to crazy town." Basically, if you commit to always take philosophy and arguments seriously, and try to act on them, it can lead to what seem like some pretty crazy and impractical places. So what should we do with this buffet of plausible-sounding but bewildering arguments?Joe and Rob discuss to what extent this should prompt us to pay less attention to philosophy, and how we as individuals can cope psychologically with feeling out of our depth just trying to make the most basic sense of the world.In today's challenging conversation, Joe and Rob discuss all of the above, as well as:What Joe doesn't like about the drowning child thought experimentAn alternative thought experiment about helping a stranger that might better highlight our intrinsic desire to help othersWhat Joe doesn't like about the expression “the train to crazy town”Whether Elon Musk should place a higher probability on living in a simulation than most other peopleWhether the deterministic twin prisoner’s dilemma, if fully appreciated, gives us an extra reason to keep promisesTo what extent learning to doubt our own judgement about difficult questions -- so-called “epistemic learned helplessness” -- is a good thingHow strong the case is that advanced AI will engage in generalised power-seeking behaviourChapters:Rob’s intro (00:00:00)The interview begins (00:09:21)Downsides of the drowning child thought experiment (00:12:24)Making demanding moral values more resonant (00:24:56)The crazy train (00:36:48)Whether we’re living in a simulation (00:48:50)Reasons to doubt we’re living in a simulation, and practical implications if we are (00:57:02)Rob's explainer about anthropics (01:12:27)Back to the interview (01:19:53)Decision theory and affecting the past (01:23:33)Rob's explainer about decision theory (01:29:19)Back to the interview (01:39:55)Newcomb's problem (01:46:14)Practical implications of acausal decision theory (01:50:04)The hitchhiker in the desert (01:55:57)Acceptance within philosophy (02:01:22)Infinite ethics (02:04:35)Rob's explainer about the expanding spheres approach (02:17:05)Back to the interview (02:20:27)Infinite ethics and the utilitarian dream (02:27:42)Rob's explainer about epicycles (02:29:30)Back to the interview (02:31:26)What to do with all of these weird philosophical ideas (02:35:28)Welfare longtermism and wisdom longtermism (02:53:23)Epistemic learned helplessness (03:03:10)Power-seeking AI (03:12:41)Rob’s outro (03:25:45)Producer: Keiran HarrisAudio mastering: Milo McGuire and Ben CordellTranscriptions: Katy Moore
undefined
13 snips
Mar 16, 2024 • 1h 52min

#76 – Joe Carlsmith on Scheming AI

Joe Carlsmith discusses the risks of AI systems being deceptive and misaligned during training, exploring the concept of scheming AI. The podcast covers the distinction between different types of AI models in training, the dangers of scheming behaviors, and the complexities of AI goals and motivations. It also delves into the challenges of detecting scheming AI early on, the importance of managing long-term AI motivations, and the uncertainties surrounding training AI models.
undefined
6 snips
Jul 1, 2024 • 1h 4min

“Loving a world you don’t trust” by Joe Carlsmith

Joe Carlsmith, the author of 'Otherness and control in the age of AGI,' discusses the duality of activity vs. receptivity, facing darkness in the world, and exploring themes of humanism and defiance in 'Angels in America'. The podcast touches on deep atheism, embracing responsibility, and trusting in reality despite its potential lack of inherent goodness.
undefined
Nov 13, 2024 • 23min

“Incentive design and capability elicitation” by Joe Carlsmith

Joe Carlsmith, an insightful author known for his work on AI alignment, dives into the complexities of incentive design and capability elicitation. He discusses how structuring AI motivations can prevent undesirable outcomes, while still promoting beneficial capabilities. The conversation tackles the relationship between motivation and option control, touching on crucial challenges and strategies for ensuring AI safety. Listeners are left contemplating the delicate balance needed to ensure that advanced AI systems align with human values.