We speak with Ryan Kidd, Co-Director at ML Alignment & Theory Scholars (MATS) program, previously "SERI MATS".
MATS (https://www.matsprogram.org/) provides research mentorship, technical seminars, and connections to help new AI researchers get established and start producing impactful research towards AI safety & alignment.
Prior to MATS, Ryan completed a PhD in Physics at the University of Queensland (UQ) in Australia.
We talk about:
* What the MATS program is
* Who should apply to MATS (next *deadline*: Nov 17 midnight PT)
* Research directions being explored by MATS mentors, now and in the past
* Promising alignment research directions & ecosystem gaps , in Ryan's view
Hosted by Soroush Pour. Follow me for more AGI content:
* Twitter: https://twitter.com/soroushjp
* LinkedIn: https://www.linkedin.com/in/soroushjp/
== Show links ==
-- About Ryan --
* Twitter: https://twitter.com/ryan_kidd44
* LinkedIn: https://www.linkedin.com/in/ryan-kidd-1b0574a3/
* MATS: https://www.matsprogram.org/
* LISA: https://www.safeai.org.uk/
* Manifold: https://manifold.markets/
-- Further resources --
* Book: “The Precipice” - https://theprecipice.com/
* Ikigai - https://en.wikipedia.org/wiki/Ikigai
* Fermi paradox - https://en.wikipedia.org/wiki/Fermi_p...
* Ajeya Contra - Bioanchors - https://www.cold-takes.com/forecastin...
* Chomsky hierarchy & LLM transformers paper + external memory - https://en.wikipedia.org/wiki/Chomsky...
* AutoGPT - https://en.wikipedia.org/wiki/Auto-GPT
* BabyAGI - https://github.com/yoheinakajima/babyagi
* Unilateralist's curse - https://forum.effectivealtruism.org/t...
* Jeffrey Ladish & team - fine tuning to remove LLM safeguards - https://www.alignmentforum.org/posts/...
* Epoch AI trends - https://epochai.org/trends
* The demon "Moloch" - https://slatestarcodex.com/2014/07/30...
* AI safety fundamentals course - https://aisafetyfundamentals.com/
* Anthropic sycophancy paper - https://www.anthropic.com/index/towar...
* Promising technical alignment research directions
* Scalable oversight
* Recursive reward modelling - https://deepmindsafetyresearch.medium...
* RLHF - could work for a while, but unlikely forever as we scale
* Interpretability
* Mechanistic interpretability
* Paper: GPT4 labelling GPT2 - https://openai.com/research/language-...
* Concept based interpretability
* Rome paper - https://rome.baulab.info/
* Developmental interpretability
* devinterp.com - http://devinterp.com
* Timaeus - https://timaeus.co/
* Internal consistency
* Colin Burns research - https://arxiv.org/abs/2212.03827
* Threat modelling / capabilities evaluation & demos
* Paper: Can large language models democratize access to dual-use biotechnology? - https://arxiv.org/abs/2306.03809
* ARC Evals - https://evals.alignment.org/
* Palisade Research - https://palisaderesearch.org/
* Paper: Situational awareness with Owain Evans - https://arxiv.org/abs/2309.00667
* Gradient hacking - https://www.lesswrong.com/posts/uXH4r6MmKPedk8rMA/gradient-hacking
* Past scholar's work
* Apollo Research - https://www.apolloresearch.ai/
* Leap Labs - https://www.leap-labs.com/
* Timaeus - https://timaeus.co/
* Other orgs mentioned
* Redwood Research - https://redwoodresearch.org/
Recorded Oct 25, 2023