Apoorva Joshi on LLM Application Evaluation and Performance Improvements
Jan 30, 2025
auto_awesome
In this conversation, Apoorva Joshi, a Senior AI Developer Advocate at MongoDB with a rich background in cybersecurity and machine learning, delves into the intricacies of evaluating LLM applications. He discusses strategies for optimizing performance through observability and monitoring, and the evolution of LLMs from text generation to complex multimedia tasks. Joshi also highlights the importance of tailored evaluations for specific industries and makes a case for democratizing these models for broader accessibility.
Evaluating LLM applications necessitates nuanced metrics like coherence and relevance as opposed to traditional performance indicators.
The integration of sophisticated data retrieval techniques is vital for enhancing the contextual relevance of information delivered by LLMs.
Deep dives
The Role and Evolution of Large Language Models
Large Language Models (LLMs) have become foundational in generative AI applications, contributing significantly to various business and technology use cases. They are utilized not only for direct user-facing applications but also for enhancing software development processes, such as automating code generation and improving system upgrades. For instance, AI agents can now autonomously handle software updates and generate Jira tickets, drastically reducing the time required for patching from days to mere hours. The current trajectory of LLMs is shifting from text generation to generating diverse content types like images, audio, and video, indicating a growing adaptability in addressing a wider array of applications.
Key Steps in Developing LLM-Based Applications
Developing applications powered by large language models involves several critical stages, with a primary emphasis on data integration specific to the application's domain. Data retrieval techniques are pivotal, advancing from simple vector searches to sophisticated methods like hybrid search and parent document retrieval, which help deliver contextually relevant information to LLMs. Moreover, effective monitoring post-deployment is crucial for ensuring the application performs as expected and for identifying any regression or degradation in quality. Many organizations may overlook the importance of monitoring, yet it is essential in maintaining functionality and performance once the applications are operational.
Evaluating Performance of LLM Applications
Evaluating LLM applications requires a shift from traditional metrics to more nuanced indicators that reflect the complexities of natural language output. Metrics such as coherence, factual accuracy, and relevance become critical, although they are challenging to quantify in comparison to traditional machine learning models. Techniques like using LLMs as evaluators or fine-tuning models are emerging as ways to assess quality, allowing for a more tailored evaluation specific to the application's context. Additionally, determining what kind of performance metrics are appropriate should begin with identifying the business's core objectives and aligning them with the evaluation strategy for optimal results.
In this podcast, Apoorva Joshi, Senior AI Developer Advocate at MongoDB, discusses how to evaluate software applications that use the Large Language Models or LLMs and how to improve the performance of LLM based applications.
Read a transcript of this interview: https://bit.ly/3WEppT6
Subscribe to the Software Architects’ Newsletter for your monthly guide to the essential news and experience from industry peers on emerging patterns and technologies:
https://www.infoq.com/software-architects-newsletter
Upcoming Events:
QCon London (April 7-9, 2025)
Discover new ideas and insights from senior practitioners driving change and innovation in software development.
https://qconlondon.com/
InfoQ Dev Summit Boston (June 9-10, 2025)
Actionable insights on today’s critical dev priorities.
devsummit.infoq.com/conference/boston2025
InfoQ Dev Summit Munich (Save the date - October 2025)
QCon San Francisco 2025 (17-21, 2025)
Get practical inspiration and best practices on emerging software trends directly from senior software developers at early adopter companies.
https://qconsf.com/
InfoQ Dev Summit New York (Save the date - December 2025)
The InfoQ Podcasts:
Weekly inspiration to drive innovation and build great teams from senior software leaders. Listen to all our podcasts and read interview transcripts:
- The InfoQ Podcast https://www.infoq.com/podcasts/
- Engineering Culture Podcast by InfoQ https://www.infoq.com/podcasts/#engineering_culture
- Generally AI: https://www.infoq.com/generally-ai-podcast/
Follow InfoQ:
- Mastodon: https://techhub.social/@infoq
- Twitter: twitter.com/InfoQ
- LinkedIn: www.linkedin.com/company/infoq
- Facebook: bit.ly/2jmlyG8
- Instagram: @infoqdotcom
- Youtube: www.youtube.com/infoq
Write for InfoQ:Learn and share the changes and innovations in professional software development.
- Join a community of ex
perts.
- Increase your visibility.
- Grow your career.
https://www.infoq.com/write-for-infoq
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode