LessWrong (Curated & Popular)

“Recent AI model progress feels mostly like bullshit” by lc

22 snips
Mar 25, 2025
The discussion dives into the skeptical view of recent advancements in AI, particularly in cybersecurity. There’s a compelling exploration of whether AI benchmarks genuinely reflect practical performance or if they’re just a facade. Concerns about AI's real-world utility and alignment challenges are addressed. The conversation critiques traditional evaluation metrics, pushing for assessments grounded in actual applications. Additionally, the pitfalls of integrating AI with an emphasis on over-reporting security issues take center stage.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

AI Security Startup Experience

  • LC and friends started an AI security company in June 2024, aiming to automate codebase monitoring.
  • Early success with Claude 3.5 saturated their internal benchmarks, improving vulnerability descriptions and severity estimates.
INSIGHT

Stagnant Model Improvement

  • Despite new model releases, LC's team saw no significant improvement in bug-finding capabilities since August.
  • Other AI startup founders reported similar experiences, despite working in different industries.
INSIGHT

Disconnect Between Scores and Usefulness

  • LLM improvements haven't significantly increased their ability to handle new tasks or larger portions of intellectual labor.
  • This disconnect between model scores and real-world experience raises concerns about measuring AI impact.
Get the Snipd Podcast app to discover more snips from this episode
Get the app