AXRP - the AI X-risk Research Podcast cover image

23 - Mechanistic Anomaly Detection with Mark Xu

AXRP - the AI X-risk Research Podcast

00:00

Redwood's Experimental Work on Mechanistic Anomalies

I'm wondering if there's been any experimental work on trying out mechanistic anomaly detection things. I think redwood is currently working on what they're calling like elk benchmarks. Where they're trying to do this sort of mechanism distinction on like toy problems like function evaluation. But probably you don't want to call that like experimental work because we're just checking how accurate are like heuristic estimators for permanence of matrices are or whatever.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app