AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Detecting and Steering Model Behavior
Recent approaches are challenging the idea that we have no insight into model functioning. Through visualizations and feature manipulation, such as setting drug-related features to zero, it is possible to detect and control model behavior at runtime, making it harder to manipulate models for undesired outcomes.