EAG Talks cover image

EAG Talks

Deceptive Tendencies of Language Models | Olli Järviniemi | EAGxNordics 2024

Oct 24, 2024
16:50

AI systems deceiving humans, particularly about their alignment, pose significant challenges for ensuring their safety. Olli Järviniemi talks about his recent research on the deceptive tendencies of language models: will LLMs take deceptive actions without external instruction or pressure to do so? The basic approach is to create a realistic simulation environment and naturally provide opportunities for deception. The focus of this talk is on the experimental setup and results, with some discussion of future research directions.

Watch on Youtube: https://www.youtube.com/watch?v=ynF8QuyO_9Q

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner