Asymmetry of the Callback-Libler Divergence

The longer you interact with the LLM, eventually the LLM will have collapsed into a wallowiji. This is formally connected to the asymmetry of the callback-libler divergence. RLA-chef is the method used by OpenAI to coerce GPT-3, 3.5, and 4, into a smart, honest, helpful, harmless assistant. If we can't naively prompt an LLM into alignment, maybe RLHF would work instead.

Play episode from 27:25

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app