LessWrong (Curated & Popular)

"How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" by Jan Brauner et al.

Oct 3, 2023
Jan Brauner, an AI researcher, discusses the development of a simple lie detector for language models. The lie detector uses unrelated follow-up questions and logistic regression. It is highly accurate and generalizes across different models and contexts. This indicates distinctive lie-related patterns in language models.
Ask episode
Chapters
Transcript
Episode notes