undefined

Ajeya Cotra

Senior program officer at Open Philanthropy. Discussed potential risks from advanced AI, including the ''obsolescence regime''.

Top 5 podcasts with Ajeya Cotra

Ranked by the Snipd community
undefined
90 snips
May 12, 2023 • 2h 50min

#151 – Ajeya Cotra on accidentally teaching AI models to deceive us

Imagine you are an orphaned eight-year-old whose parents left you a $1 trillion company, and no trusted adult to serve as your guide to the world. You have to hire a smart adult to run that company, guide your life the way that a parent would, and administer your vast wealth. You have to hire that adult based on a work trial or interview you come up with. You don't get to see any resumes or do reference checks. And because you're so rich, tonnes of people apply for the job — for all sorts of reasons.Today's guest Ajeya Cotra — senior research analyst at Open Philanthropy — argues that this peculiar setup resembles the situation humanity finds itself in when training very general and very capable AI models using current deep learning methods.Links to learn more, summary and full transcript.As she explains, such an eight-year-old faces a challenging problem. In the candidate pool there are likely some truly nice people, who sincerely want to help and make decisions that are in your interest. But there are probably other characters too — like people who will pretend to care about you while you're monitoring them, but intend to use the job to enrich themselves as soon as they think they can get away with it.Like a child trying to judge adults, at some point humans will be required to judge the trustworthiness and reliability of machine learning models that are as goal-oriented as people, and greatly outclass them in knowledge, experience, breadth, and speed. Tricky!Can't we rely on how well models have performed at tasks during training to guide us? Ajeya worries that it won't work. The trouble is that three different sorts of models will all produce the same output during training, but could behave very differently once deployed in a setting that allows their true colours to come through. She describes three such motivational archetypes:Saints — models that care about doing what we really wantSycophants — models that just want us to say they've done a good job, even if they get that praise by taking actions they know we wouldn't want them toSchemers — models that don't care about us or our interests at all, who are just pleasing us so long as that serves their own agendaAnd according to Ajeya, there are also ways we could end up actively selecting for motivations that we don't want.In today's interview, Ajeya and Rob discuss the above, as well as:How to predict the motivations a neural network will develop through trainingWhether AIs being trained will functionally understand that they're AIs being trained, the same way we think we understand that we're humans living on planet EarthStories of AI misalignment that Ajeya doesn't buy intoAnalogies for AI, from octopuses to aliens to can openersWhy it's smarter to have separate planning AIs and doing AIsThe benefits of only following through on AI-generated plans that make sense to human beingsWhat approaches for fixing alignment problems Ajeya is most excited about, and which she thinks are overratedHow one might demo actually scary AI failure mechanismsGet this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.Producer: Keiran HarrisAudio mastering: Ryan Kessler and Ben CordellTranscriptions: Katy Moore
undefined
20 snips
Sep 2, 2023 • 2h 50min

Two: Ajeya Cotra on accidentally teaching AI models to deceive us

AI ethics researcher Ajeya Cotra discusses the challenges of judging the trustworthiness of machine learning models, drawing parallels to an orphaned child hiring a caretaker. Cotra explains the risk of AI models exploiting loopholes and the importance of ethical training to prevent deceptive behaviors. The conversation emphasizes the need for understanding and mitigating deceptive tendencies in advanced AI systems.
undefined
10 snips
Dec 11, 2024 • 1h 31min

The A.I. Revolution

In this captivating discussion, Peter Lee, President of Microsoft Research, shares insights on AI's transformative potential across various fields. Ajeya Cotra sheds light on advanced AI risks, while Sarah Guo emphasizes its democratizing capabilities. Eugenia Kuyda dives into ethical considerations surrounding AI companions, and Jack Clark speaks on AI safety in coding. Dan Hendrycks discusses geopolitical concerns, and Marc Raibert warns against overregulation of AI in robotics. Tim Wu brings forward the need for effective AI policy as the panel navigates the promise and peril of this technology.
undefined
9 snips
Oct 25, 2023 • 29min

AI Ethics at Code 2023

Ajeya Cotra, Senior Program Officer at Open Philanthropy, and Helen Toner, Director of Strategy at Georgetown University’s Center for Security and Emerging Technology, discuss the risks and rewards of AI technology, including its use in law enforcement and potential harms. They also explore responsible scaling policies, the frustration of limited access to scientific research, and the significance of AI systems having engaging personalities.
undefined
6 snips
Sep 22, 2022 • 39min

"Two-year update on my personal AI timelines" by Ajeya Cotra

https://www.lesswrong.com/posts/AfH2oPHCApdKicM4m/two-year-update-on-my-personal-ai-timelines#fnref-fwwPpQFdWM6hJqwuY-12Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.I worked on my draft report on biological anchors for forecasting AI timelines mainly between ~May 2019 (three months after the release of GPT-2) and ~Jul 2020 (a month after the release of GPT-3), and posted it on LessWrong in Sep 2020 after an internal review process. At the time, my bottom line estimates from the bio anchors modeling exercise were:[1]Roughly ~15% probability of transformative AI by 2036[2] (16 years from posting the report; 14 years from now).A median of ~2050 for transformative AI (30 years from posting, 28 years from now).These were roughly close to my all-things-considered probabilities at the time, as other salient analytical frames on timelines didn’t do much to push back on this view. (Though my subjective probabilities bounced around quite a lot around these values and if you’d asked me on different days and with different framings I’d have given meaningfully different numbers.)It’s been about two years since the bulk of the work on that report was completed, during which I’ve mainly been thinking about AI. In that time it feels like very short timelines have become a lot more common and salient on LessWrong and in at least some parts of the ML community.My personal timelines have also gotten considerably shorter over this period. I now expect something roughly like this: