“Recent AI model progress feels mostly like bullshit” by lc

22 snips

Mar 25, 2025

The discussion dives into the skeptical view of recent advancements in AI, particularly in cybersecurity. There’s a compelling exploration of whether AI benchmarks genuinely reflect practical performance or if they’re just a facade. Concerns about AI's real-world utility and alignment challenges are addressed. The conversation critiques traditional evaluation metrics, pushing for assessments grounded in actual applications. Additionally, the pitfalls of integrating AI with an emphasis on over-reporting security issues take center stage.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

AI Security Startup Experience

LC and friends started an AI security company in June 2024, aiming to automate codebase monitoring.
Early success with Claude 3.5 saturated their internal benchmarks, improving vulnerability descriptions and severity estimates.

INSIGHT

Stagnant Model Improvement

Despite new model releases, LC's team saw no significant improvement in bug-finding capabilities since August.
Other AI startup founders reported similar experiences, despite working in different industries.

INSIGHT

Disconnect Between Scores and Usefulness

LLM improvements haven't significantly increased their ability to handle new tasks or larger portions of intellectual labor.
This disconnect between model scores and real-world experience raises concerns about measuring AI impact.

Get the Snipd Podcast app to discover more snips from this episode

Get the app