AXRP - the AI X-risk Research Podcast cover image

23 - Mechanistic Anomaly Detection with Mark Xu

AXRP - the AI X-risk Research Podcast

CHAPTER

Redwood's Experimental Work on Mechanistic Anomalies

I'm wondering if there's been any experimental work on trying out mechanistic anomaly detection things. I think redwood is currently working on what they're calling like elk benchmarks. Where they're trying to do this sort of mechanism distinction on like toy problems like function evaluation. But probably you don't want to call that like experimental work because we're just checking how accurate are like heuristic estimators for permanence of matrices are or whatever.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner