
SANS Internet Stormcenter Daily Cyber Security Podcast (Stormcast)
SANS Stormcast Monday Mar 3rd: AI Training Data Leaks; MITRE Caldera Vuln; modsecurity bypass
Mar 3, 2025
The podcast dives into alarming AI training data leaks, revealing that the Common Crawl dataset harbors exposed API keys and secrets. It also discusses GitHub's Copilot inadvertently accessing sensitive data from previously private repositories. The MITRE Caldera framework is highlighted for its potential vulnerability, allowing unauthorized code execution. Lastly, it addresses a modsecurity rule bypass, emphasizing the critical importance of regular software updates to enhance cybersecurity defenses.
07:08
AI Summary
AI Chapters
Episode notes
Podcast summary created with Snipd AI
Quick takeaways
- The exposure of approximately 12,000 active API keys in the Common Crawl dataset emphasizes the critical need for secure coding practices to prevent leaks.
- Concerns regarding GitHub Copilot highlight the risks of using AI training data from previously public repositories, potentially leading to unintended exposure of sensitive information.
Deep dives
AI Training Data and API Key Leakage
A significant issue discussed involves the security risks associated with AI training data, particularly regarding the detection of leaked API keys. Truffle Security has analyzed a massive dataset from Common Crawl, which aggregates web data over several years, revealing around 12,000 active API keys on various sites. These findings highlight the importance of secure coding practices, as old and potentially vulnerable keys can be found in publicly available data. The potential for misuse is considerable, especially since many credentials are reused across multiple sites, underlining the need for web developers to ensure sensitive information is not inadvertently exposed.
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.