AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Importance of Intuition in AI Alignment Safety Research
The difficulty is what makes the human say I understand and is it true? Is it correct or is it something that fools the human? When the verifier is broken the more powerful suggestion does not help. It just learns to fool the verifier. We'll mention that also but maybe in this perfect world where we can do serious alignment research humans and AI together. RLHF thumbs up produce more outputs like that one!