Adam Wenchel, Arthur founder and CEO, discusses evaluating AI model performance
Jan 15, 2024
30:25
auto_awesome Snipd AI
Adam Wenchel, CEO of Arthur, discusses evaluating AI model performance. He shares insights on Capital One's early AI strategy and the value of monitoring AI model performance. The podcast also explores the debate between closed foundation models and OpenSURS models, managing accuracy in enterprise deployment of LLMs, and the importance of explainability in AI models. The speakers discuss the intense process of creating enduring startups.
Read more
AI Summary
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
The importance of evaluating AI model performance and monitoring its performance in production.
The challenges faced by enterprises in operationalizing and scaling generative AI models.
Deep dives
Importance of AI Ethics and Interdisciplinary Approach
The podcast episode highlights the importance of AI ethics and proposes an interdisciplinary approach to address AI's biggest questions. The suggested approach involves forming an AI ethics team consisting of a technologist, data ethicist, philosopher, and neuroscientist. This team would help make decisions about algorithm inclusion, ethical outputs, and human involvement. Transparency in AI usage and involving the public in key decisions are also emphasized to build trust.
Challenges and Maturity of Generative AI Deployments
The episode discusses the challenges and immaturity of tooling to support enterprise-scale generative AI deployments. The guest, Adam Wenchill, CEO of Arthur, acknowledges that while generative AI has immense potential, it requires extensive tuning and monitoring to ensure safety, reliability, and trustworthiness. Notable obstacles include system performance, wrong answers or hallucinations, prompt injection, security vulnerabilities, and toxicity. Arthur aims to address these challenges through their AI performance evaluation and observability platform.
The Journey and Growth of AI in the Enterprise
Adam Wenchill shares his journey in the AI field, particularly scaling AI teams and mapping AI strategies for enterprises. His experience at Capital One highlighted the significance of deploying AI models on an enterprise scale, where millions of people's financial livelihoods can be impacted. The conversation sheds light on the increasing demand and urgency for enterprises to leverage AI to unlock the value in their proprietary data. It is emphasized that AI technology has significantly matured over the past 20 years, thanks to advancements in GPU computing, algorithms, and cloud computing.
Building Infrastructure and Tooling for Generative AI
The episode explores the need for specialized tooling and infrastructure to support generative AI. The guest, Adam Wenchill, highlights the challenges faced by enterprises when trying to operationalize generative AI models, such as limited tooling, system performance, and scalability issues. Arthur, as an AI delivery engine, focuses on providing firewall capabilities, chat interfaces, and prompt injection controls for enterprises. The guest envisions the emergence of a company or companies that will act as the foundational infrastructure behind generative AI, akin to what Cisco did for the internet.
Adam Wenchel is the CEO of Arthur, a company with a platform that gives you an immediate comprehensive AI performance solution across LLMs, Computer Vision, Tabular Data, and NLP.
Adam and the team have been making AI observable for almost 5 years with Arthur. Arthur has raised over $60 million from a legendary group of investors including Index Ventures, Greycroft Partners, Work-Bench, and others. An equal list of impressive customers they’ve served includes Humana and Plaid, among others.
Listen and learn
How Adam started his career in AI
How he helped map out Capital One’s early AI strategy
The value of evaluating AI model performance
How Arthur launched its first LLM-specific product and what the team learned
How to monitor the performance of an LLM model in production and which questions to ask when evaluating it
The lessons from growing a startup that nobody talks about