
Do the Biorisk Evaluations of AI Labs Actually Measure the Risk of Developing Bioweapons?
BlueDot Narrated
00:00
Benchmarks Are Common but Saturate
Perrin Walker describes how labs rely on auto-graded benchmarks that rapidly reach high scores over time.
Play episode from 03:18
Transcript


