Advancements in Language Model Safeguards

This chapter explores recent advancements in language models, particularly Claude's constitutional classifiers that help prevent harmful content generation. It highlights the interception mechanisms and the impact of model weight variations on probabilistic outputs.

Play episode from 47:01

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Welcome back to The FAIK Files!

In this week's episode:

An Australian radio station created a fake Asian female host using AI
The BBC resurrects Agatha Christie while a family member brings a murder victim to court
We break down "Strategic Text Strings" - sequences of gibberish that can jailbreak AI systems
AI recruitment tools might be making hiring worse, not better

Check out ⁠⁠The Deception Project⁠⁠ to learn about our upcoming ⁠⁠Offensive Cyber Deception Masterclass⁠⁠ and more.

Also check out Perry's new newsletter, Deceptive Minds: a newsletter about how we are fooled, how we fool ourselves, and what we can do about it. Subscribe on LinkedIn ⁠⁠https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7319922626200510464⁠⁠

Want to leave us a voicemail? Here's the magic link to do just that: ⁠⁠https://sayhi.chat/FAIK⁠⁠

You can also join our Discord server here: ⁠⁠https://faik.to/discord⁠

***** NOTES AND REFERENCES *****

Faking Diversity - AI Radio Host Controversy:

Mediaweek article: https://www.mediaweek.com.au/backlash-over-arn-use-of-ai-radio-host-modelled-on-asian-woman/
Original Blog Source: https://www.thecarpet.com.au/p/meet-thy-the-radio-host-i-dont-think
Follow-Up with audio clips: https://www.thecarpet.com.au/p/radios-most-cynical-ai-experiment
Mason's insights on radio industry cost-cutting and the vanishing of part-time jobs post-COVID

AI Resurrections:

BBC resurrects Agatha Christie to teach writing: https://www.thetimes.com/culture/books/article/agatha-christie-ai-bbc-maestro-jeremy-vine-9t5nqvz69
Family member resurrects murder victim to speak in court: https://www.404media.co/i-loved-that-ai-judge-moved-by-ai-generated-avatar-of-man-killed-in-road-rage-incident/

Strategic Text Strings & Jailbreaking AI:

Previous episode on "AI SEO Optimization": https://www.youtube.com/watch?v=wm1yAfTjKzY
Harvard Business School article: https://www.library.hbs.edu/working-knowledge/gen-ai-marketing-how-some-gibberish-code-can-give-products-an-edge
Research paper: "Manipulating Large Language Models to Increase Product Visibility": https://arxiv.org/pdf/2404.07981
Broken Hill: https://bishopfox.com/blog/brokenhill-attack-tool-largelanguagemodels-llm
Broken Hill Github: https://github.com/BishopFox/BrokenHill
Universal and Transferable Adversarial Attacks on Aligned Language Models: https://arxiv.org/pdf/2307.15043

AI Dumpster Fire of the Week - AI in Recruitment:

404 Media article on AI recruiter: https://www.404media.co/ai-recruiter-apriora-tiktok/
Testing "AI Interviewers" with HeyGen (https://HeyGen.com/) and Tavus (https://www.tavus.io/)

Want to connect with us? Here's how:

Connect with Perry:

Perry on LinkedIn: ⁠⁠https://www.linkedin.com/in/perrycarpenter⁠⁠
Perry on X: ⁠⁠https://x.com/perrycarpenter⁠⁠
Perry on BlueSky: ⁠⁠https://bsky.app/profile/perrycarpenter.bsky.social⁠⁠

Connect with Mason:

Mason on LinkedIn: ⁠⁠https://www.linkedin.com/in/mason-amadeus-a853a7242/⁠⁠
Mason on BlueSky: ⁠⁠https://bsky.app/profile/wickedinterest.ing⁠

Learn more about your ad choices. Visit megaphone.fm/adchoices

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books