Denys Linkov on Micro Metrics for LLM System Evaluation

13 snips

Dec 16, 2024

Denys Linkov, Head of Machine Learning at Voiceflow, discusses the vital role of micro metrics in evaluating large language models (LLMs). He highlights how granular assessment enhances user experience and business value. The conversation touches on the challenges of measuring relevant aspects like user engagement and emotional responses from AI. Linkov also delves into prompt engineering complexities and the importance of automated evaluation frameworks. Lastly, he shares insights on AI orchestration for better customer support, focusing on customizable workflows.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Micro Metrics for User Experience

Micro metrics measure specific user experience issues.
These metrics tie to business value, unlike broader metrics like accuracy.

ANECDOTE

Language Switching Issue

Voiceflow encountered an issue with LLMs switching languages mid-conversation.
Implementing a retry mechanism solved 99% of these issues.

ADVICE

Practical LLM Development

Don't aim for LLM perfection; focus on nuanced trade-offs in production.
Leverage domain expertise to define relevant metrics.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Live from the QCon San Francisco Conference, we are talking with Denys Linkov, Head of Machine Learning at Voiceflow. Linkov shares insights on using micro metrics to refine large language models (LLMs), highlighting the importance of granular evaluation, continuous iteration, and rigorous prompt engineering to create reliable and user-focused AI systems. Read a transcript of this interview: https://bit.ly/49tOvt8 Subscribe to the Software Architects’ Newsletter for your monthly guide to the essential news and experience from industry peers on emerging patterns and technologies: https://www.infoq.com/software-architects-newsletter Upcoming Events: QCon London (April 7-9, 2025) Discover new ideas and insights from senior practitioners driving change and innovation in software development. https://qconlondon.com/ InfoQ Dev Summit Boston (June 9-10, 2025) Actionable insights on today’s critical dev priorities. devsummit.infoq.com/conference/boston2025 InfoQ Dev Summit Munich (Save the date - October 2025) QCon San Francisco 2025 (17-21, 2025) Get practical inspiration and best practices on emerging software trends directly from senior software developers at early adopter companies. https://qconsf.com/ InfoQ Dev Summit New York (Save the date - December 2025) The InfoQ Podcasts: Weekly inspiration to drive innovation and build great teams from senior software leaders. Listen to all our podcasts and read interview transcripts: - The InfoQ Podcast https://www.infoq.com/podcasts/ - Engineering Culture Podcast by InfoQ https://www.infoq.com/podcasts/#engineering_culture - Generally AI: https://www.infoq.com/generally-ai-podcast/ Follow InfoQ: - Mastodon: https://techhub.social/@infoq - Twitter: twitter.com/InfoQ - LinkedIn: www.linkedin.com/company/infoq - Facebook: bit.ly/2jmlyG8 - Instagram: @infoqdotcom - Youtube: www.youtube.com/infoq Write for InfoQ:Learn and share the changes and innovations in professional software development. - Join a community of ex perts. - Increase your visibility. - Grow your career. https://www.infoq.com/write-for-infoq