LessWrong (Curated & Popular) cover image

“Optimizing The Final Output Can Obfuscate CoT (Research Note)” by lukemarks, jacob_drori, cloud, TurnTrout

LessWrong (Curated & Popular)

00:00

Challenging Reasoning Through Modified Methods

This chapter delves into a reasoning gym problem that utilizes shell commands, modifying traditional approaches to increase complexity for the model. It evaluates the influence of penalizing ground truth references on model performance through experimental analysis with an LLM judge.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app