I think one thing that I'm interested in is trying to build up decision procedures. So by decision procedure I mean some way of taking some class of question and then kind of formally specifying the set of questions that I claim I'm going to know how to answer. How do you think about what a simple question is? In math there are some questions that seem simple that we just know because math teachers have tried to prove them are like ridiculously complicated secretly. It might make sense to aspire to there being no question about basic mechanics that involves only the following four concepts that I will ever be confused by. For instance I think that objects moving around in a vacuum exerting forces on each other
Read the full transcript here.
How hard is it to arrive at true beliefs about the world? How can you find enjoyment in being wrong? When presenting claims that will be scrutinized by others, is it better to hedge and pad the claims in lots of caveats and uncertainty, or to strive for a tone that matches (or perhaps even exaggerates) the intensity with which you hold your beliefs? Why should you maybe focus on drilling small skills when learning a new skill set? What counts as a "simple" question? How can you tell when you actually understand something and when you don't? What is "cargo culting"? Which features of AI are likely in the future to become existential threats? What are the hardest parts of AI research? What skills will we probably really wish we had on the eve of deploying superintelligent AIs?
Buck Shlegeris is the CTO of Redwood Research, an independent AI alignment research organization. He currently leads their interpretability research. He previously worked on research and outreach at the Machine Intelligence Research Institute. His website is shlegeris.com.
Staff
Music
Affiliates