AXRP - the AI X-risk Research Podcast

30 - AI Security with Jeffrey Ladish

25 snips

Apr 30, 2024

AI security expert Jeffrey Ladish discusses the robustness of safety training in AI models, dangers of open LLMs, securing against attackers, and the state of computer security. They explore undoing safety filters, AI phishing, and making AI more legible. Topics include securing model weights, defending against AI exfiltration, and red lines in AI development.

Ask episode

Chapters

Transcript

Episode notes

Exploring AI Safety Fine-Tuning and Model Development

01:58 • 14min

Navigating AI Security Risks

15:40 • 12min

The Dark Side of AI: Exploiting Vulnerabilities for Malicious Activities

27:36 • 19min

The Battle Between AI Security and Potential Threats

46:09 • 22min

Navigating the Complexities of Cybersecurity and AI

01:08:11 • 22min

Enhancing AI Security Practices

01:29:43 • 8min

Exploring AI Models' Capabilities in Task Decomposition and Deception

01:38:03 • 3min

Exploring the Potential of AI Models with Enhanced Capabilities

01:41:00 • 4min

The Rise of Language Models in Phishing Attacks

01:44:38 • 7min

Navigating Identity Verification in the Era of AI Deception

01:52:05 • 15min

Exploring Causal Scrubbing and Network Evaluation in AI Research

02:07:16 • 2min

Exploring AI Model Capabilities and the Need for Transparency

02:09:04 • 6min