Scalable Alignment Research

Fars curren aims to incubate new, scaleable alignment research. They are working on adversarial attacks against narrowly superhuman systems like alpago language model and bench marks for value learning. Mery thinks technical alignment is really hard and that we are very far from a solution. However, they think that policy solutions have even less hope.

Play episode from 52:00

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app