ThursdAI - The top AI news from the past week

📆 ThursdAI - Feb 20 - Live from AI Eng in NY - Grok 3, Unified Reasoners, Anthropic's Bombshell, and Robot Handoffs!

15 snips
Feb 20, 2025
Leonard Tang, Co-founder at Haize Labs, joins the conversation to discuss their innovative open source evaluation library, Verdict, aimed at improving AI judgment reliability. They dive into the fascinating capabilities of Grok 3, comparing its performance with competitors and addressing censorship challenges. Tang also highlights the impact of nepotism bias in AI models and how Verdict seeks to enhance evaluation efficiency. The podcast explores the exploratory advances in robotics, including robots learning to hand objects to one another, showcasing the exciting future of AI.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
INSIGHT

XAI's Weekend Blitz

  • XAI worked tirelessly through the weekend to release Grok 3.
  • This dedication highlights their commitment to staying competitive in the rapidly evolving AI landscape.
INSIGHT

Grok 3's Benchmark Performance

  • Grok 3 boasts impressive benchmark results, especially in reasoning and deep search capabilities.
  • It demonstrates a significant improvement over its predecessor, Grok 2.
ANECDOTE

Grok 3 and the Ice Cube Riddle

  • Alex Volkov tested Grok 3 with a tricky ice cube riddle, which it failed, similar to other LLMs.
  • This highlights the ongoing challenge of real-world logic understanding in AI models.
Get the Snipd Podcast app to discover more snips from this episode
Get the app