AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Exploring Vulnerabilities in Language Models
This chapter discusses the transferability of attacks in open and black box models, emphasizing how architecture and training data influence efficacy. It covers various types of adversarial attacks, including jailbreak and misdirection attacks, and the inherent limitations of current defenses in language models. The conversation also highlights the practical implications and complexities of manipulating large language models, with a focus on their susceptibility to structured inputs and role hacking.