AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Robustness of Chatbots
The researchers used a technique called gradient descent to attack open source models. They were able to get harmful behaviors out of the kunya, which was one of the models they explicitly trained this for. Quad two seems to be the only model that really is quite robust to this strategy.