

“Recent AI model progress feels mostly like bullshit” by lc
22 snips Mar 25, 2025
The discussion dives into the skeptical view of recent advancements in AI, particularly in cybersecurity. There’s a compelling exploration of whether AI benchmarks genuinely reflect practical performance or if they’re just a facade. Concerns about AI's real-world utility and alignment challenges are addressed. The conversation critiques traditional evaluation metrics, pushing for assessments grounded in actual applications. Additionally, the pitfalls of integrating AI with an emphasis on over-reporting security issues take center stage.
AI Snips
Chapters
Transcript
Episode notes
AI Security Startup Experience
- LC and friends started an AI security company in June 2024, aiming to automate codebase monitoring.
- Early success with Claude 3.5 saturated their internal benchmarks, improving vulnerability descriptions and severity estimates.
Stagnant Model Improvement
- Despite new model releases, LC's team saw no significant improvement in bug-finding capabilities since August.
- Other AI startup founders reported similar experiences, despite working in different industries.
Disconnect Between Scores and Usefulness
- LLM improvements haven't significantly increased their ability to handle new tasks or larger portions of intellectual labor.
- This disconnect between model scores and real-world experience raises concerns about measuring AI impact.