AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Contrasting Rigor in Security Disciplines
This chapter explores the contrasts and connections between traditional security and machine learning security, emphasizing the rigorous evaluation standards of traditional frameworks. The authors share personal insights on how their foundational training in classical security enriches their approach to machine learning research.
Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?
Goto https://tufalabs.ai/
***
Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0
TOC:
1. ML Security Fundamentals
[00:00:00] 1.1 ML Model Reasoning and Security Fundamentals
[00:03:04] 1.2 ML Security Vulnerabilities and System Design
[00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior
[00:13:20] 1.4 Model Training, RLHF, and Calibration Effects
2. Model Evaluation and Research Methods
[00:19:40] 2.1 Model Reasoning and Evaluation Metrics
[00:24:37] 2.2 Security Research Philosophy and Methodology
[00:27:50] 2.3 Security Disclosure Norms and Community Differences
3. LLM Applications and Best Practices
[00:44:29] 3.1 Practical LLM Applications and Productivity Gains
[00:49:51] 3.2 Effective LLM Usage and Prompting Strategies
[00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code
4. Advanced LLM Research and Architecture
[00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience
[01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges
[01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction
REFS:
[00:01:15] Nicholas Carlini’s personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/
[00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/
[00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644
[00:05:20] Computer Fraud and Abuse Act (CFAA) – primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud
[00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html
[00:16:10] Paper: “Self-Play Preference Optimization for Language Model Alignment” (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675
[00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774
[00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation
[00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241
[00:27:40] Nicholas Carlini’s essay “Why I Attack” (June 2024) – motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html
[00:34:05] Google Project Zero’s 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html
[00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878
[01:04:05] Rust’s ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html
[01:10:05] Paper: “Stealing Part of a Production Language Model” (Carlini et al., March 2024) – extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634
[01:10:55] First model stealing paper (Tramèr et al., 2016) – attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
Listen to all your favourite podcasts with AI-powered features
Listen to the best highlights from the podcasts you love and dive into the full episode