This chapter provides updates on various AI tools and software, discussing the release of new projects like itown and Jamba instruct, as well as the introduction of models such as Quinn 1.5 and GPT2-chat. It explores the significance of Large Language Models (LMCs) like Phi3 and the challenges of accurate evaluations, including the impact of diverse benchmarks and data contamination issues. The episode emphasizes the importance of evaluating models carefully, particularly in areas like Vibes eval by Rekka AI and correlations between model performance and external annotators.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode