Automating Reinforcement Learning from Human Feedback

This chapter explores the concept of automating RLHF (Reinforcement Learning from Human Feedback) with AI, discussing the limitations of relying solely on human feedback and the lab's work on AI safety through debates. The speaker also discusses alternative architectures and avenues to achieve higher intelligence in machines.

Play episode from 47:20

chevron_right

Transcript

chevron_right

Transcript

Episode notes

This episode is sponsored by Celonis ,the global leader in process mining. AI has landed and enterprises are adapting. To give customers slick experiences and teams the technology to deliver. The road is long, but you're closer than you think. Your business processes run through systems. Creating data at every step. Celonis reconstructs this data to generate Process Intelligence. A common business language. So AI knows how your business flows. Across every department, every system and every process. With AI solutions powered by Celonis enterprises get faster, more accurate insights. A new level of automation potential. And a step change in productivity, performance and customer satisfaction Process Intelligence is the missing piece in the AI Enabled tech stack.

Go to https://celonis.com/eyeonai to find out more.

Welcome to episode 151 of the 'Eye on AI' podcast. In this episode, host Craig Smith sits down with Asa Cooper, a postdoctoral researcher at NYU, who is at the forefront of language model safety.

This episode takes us on a journey through the complexities of AI situational awareness, the potential for consciousness in language models, and the future of AI safety research.

Craig and Asa delve into the nuances of AI situational awareness and its distinction from sentience. Asa, with his rich background in NLP and AI safety from Edinburgh University, shares insights from his post-doc work at NYU, discussing collaborative efforts on a paper that has garnered attention for its take on situational awareness in large language models (LLMs).

We explore the economic drivers behind creating AI with such capabilities and the role of scaling versus algorithmic innovation in achieving this milestone. We also delve into the concept of agency in LLMs, the challenges of post-deployment monitoring, and the effectiveness of current measures in detecting situational awareness.

To wrap things off, we break down the importance of source trustworthiness and the model's ability to discern reliable information, a critical aspect of AI safety and functionality, so make sure to watch till the end.

Craig Smith Twitter: https://twitter.com/craigss

Eye on A.I. Twitter: https://twitter.com/EyeOn_AI

(00:00) Preview and Introduction

(02:30) Asa's NLP Expertise and the Safety of Language Models

(06:05) Breaking Down AI's Situational Awareness

(13:44) Evolution of AI: Predictive Models to AI Coworkers

(20:29) New Frontier in AI Development?

(27:14) Measuring AI's Awareness

(33:49) Innovative Experiments with LLMs

(40:51) The Consequences of Detecting Situational Awareness in AI

(44:07) How To Train AI On Trusted Sources

(49:52) What Is The Future of AI Training?