AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Bigger the Model, the Wider the Vulnerability
Aligning large language models (LLMs) presents challenges, as increased size and capability also expand the attack surface, making them more susceptible to jailbreaks and prompt injections. Attackers have a vast operational space that exceeds the limitations of reinforcement learning from human feedback, which only covers a narrow input range. When inputs fall outside this established distribution, the model's behavior becomes unpredictable. For specific applications, more structured, prescriptive approaches to interaction, such as defined workflows used in call centers, can lead to better outcomes compared to traditional open-ended interactions.