ThursdAI - The top AI news from the past week

ThursdAI - May 2nd - New GPT2? Copilot Workspace, Evals and Vibes from Reka, LLama3 1M context (+ Nous finetune) & more AI news

46 snips
May 3, 2024
Reka from Reka AI Lab discusses Evals, Vibe checks, and a new paper on overfitting. Idan Gazit from GitHub Next talks about Copilot Workspace. Topics include LLama-3 rankings, GPT2-chat, and data contamination issues in AI models.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Beware Overfitting Leaderboards

  • Evaluation leaderboards can become overfitted by iterative tuning, reducing model usefulness.
  • Multiple diverse benchmarks are essential to comprehensively test and improve models effectively.
INSIGHT

Insights from Vibes-Eval Dataset

  • Expert human evaluators produce more consistent scores on hard multimodal tasks than non-experts.
  • Small models sometimes outperform larger ones on specific tasks due to biases like text dominance in multimodal models.
ADVICE

Iterate Specs and Plans Effectively

  • Review and refine natural language specs before code generation to guide AI effectively.
  • Approve detailed plans listing files and changes before generating actual code diffs for better steering.
Get the Snipd Podcast app to discover more snips from this episode
Get the app