
802: In Case You Missed It in June 2024
Super Data Science: ML & AI Podcast with Jon Krohn
00:00
Implications of Model Fragility and Safety in AI Systems
This chapter dives into the delicate nature of RLHF models and the risks associated with removing safety layers from models during fine-tuning. It highlights how adjusting models for specific tasks can compromise safety features initially developed during pre-training, stressing the need to view safety as integral to the entire AI system.
Transcript
Play full episode