Interconnects cover image

Interconnects

Nous Hermes 3 and exploiting underspecified evaluations

Aug 16, 2024
The discussion kicks off with the launch of a new model, questioning what defines a 'frontier model.' Notable comparisons are drawn with LAMA 3.1 and the importance of transparent evaluation metrics emerges. The conversation elaborates on valuable lessons learned from the training process of Hermes 3. The broader implications for technology policy are also highlighted, emphasizing the need for integrity in AI evaluations.
08:32

Podcast summary created with Snipd AI

Quick takeaways

  • The uncertainty surrounding the criteria for identifying Frontier Models has sparked ongoing debates about transparency and credibility in the tech ecosystem.
  • Discrepancies between Hermes III's reported performance and actual results underscore the critical need for stringent evaluation standards and clear documentation in model assessments.

Deep dives

Defining Frontier Models

The criteria for identifying a model as a Frontier Model is currently unclear, leading to debates within the tech ecosystem. Traditionally, success in the chatbot arena by LM Sys has served as a benchmark, but trust in this measure is waning. With the introduction of an open-weight frontier model, LAMA 3.1405 billion, there is speculation about whether this will lower the barrier for others to join the Frontier Model Club. As many organizations strive to expand the capabilities of modern language models, the need for a solid framework to evaluate these models becomes increasingly critical.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode