Gideon Mendels, co-founder and CEO of Comet, dives into the fascinating world of AI with a focus on Opik, his open-source model evaluation platform. He shares how Opik's rise exceeded expectations and emphasizes the critical role of CI/CD in AI development. Gideon also discusses the alarming decline in dedicated machine learning engineers and differentiates between genuine and 'fake' open-source solutions. The conversation wraps up with insights on the evolving AI landscape and the need for organizations to adapt to new evaluation methodologies.
Opik emerged as a vital tool for evaluating LLM applications, addressing new developer needs in AI through open-source contributions.
The podcast emphasizes the shift towards community-driven innovation in AI, prioritizing evaluation and testing to ensure model performance and reliability.
Deep dives
The Dominance of Open Source in DevTools
Open source software has become the predominant choice in many categories of development tools, particularly in the machine learning and artificial intelligence sectors. The podcast emphasizes that while some proprietary software companies do well, the vast majority of valuable tools derive from open source contributions. This trend is evident in the rise of platforms such as OPIC, which was developed after recognizing the unique needs of developers as they transitioned from training models to utilizing large language models (LLMs) through APIs. As open source continues to demonstrate its effectiveness, it suggests an ongoing shift towards community-driven innovation in the tech industry.
Transitioning to OPIC: A New Product for Evolving Needs
The discussion reveals how OPIC emerged in response to evolving customer needs in machine learning and artificial intelligence applications. Originally focused on experiment tracking, the functionality expanded to address new requirements brought about by the introduction of generative AI applications. Customers sought assistance in utilizing APIs from companies like OpenAI and Anthropic, leading to the development of unique features tailored to these use cases, such as prompt versioning and tracking. This shift not only highlights the agile nature of modern tech companies but also demonstrates the influence of larger trends in AI on product development.
The Importance of Evaluation in Production
A significant point discussed is the critical role of evaluation and testing in deploying generative AI models. Unlike standard unit tests familiar to software engineers, the evaluation of LLM applications requires a different approach due to the nature of their outputs. OPIC addresses this challenge by allowing developers to perform semantic-level tests rather than relying solely on string matching. This capability ensures that users can maintain high confidence in their models' performance and iterate effectively through observation and feedback mechanisms throughout the software development life cycle.
The Future Landscape of AI Development
The conversation highlights the expectation that many foundational models will become commoditized, allowing developers to focus on leveraging existing models rather than creating new ones. The introduction of AI as a service models enables more teams to deploy generative AI solutions without the need for extensive resources traditionally required for machine learning model development. This shift encourages a more collaborative and accessible environment for developers interested in AI, while also prompting the industry to evolve in terms of practices, benchmarks, and evaluation metrics. As technology rapidly advances, the importance of adaptability and continuous improvement in AI applications becomes increasingly apparent.
Gideon Mendels (Github: @gidim) is the co-founder and CEO of Comet, the end-to-end model evaluation platform for AI developers. Among the tools in the Comet ecosystem is Opik, an open-source solution for evaluating, testing and monitoring LLM applications. Opik allows users to log traces and spans, define and compute evaluation metrics, score LLM outputs, compare performance across app versions, and more. As a true open-source project, its full featureset is available for use by anyone, completely free.
Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.