Evaluating LLMs with Leva

12 snips

Aug 26, 2025

Kieran Klaassen, a Ruby developer and the mind behind the AI tools Cora and Leva, shares his passion for AI and Rails. He dives into the creation of the Leva gem, designed for evaluating large language models, and discusses his journey in AI product development. The conversation covers best practices in AI tool creation, the importance of continuous evaluation, and effective workflow management. Kieran highlights the collaborative spirit of the Ruby community and the joy found in integrating AI into Ruby on Rails projects.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Built Leva From A Real Need

Kieran built Leva because he needed evaluations for his email product and found nothing suitable in Ruby.
He often creates gems to solve his own needs quickly and then shares them publicly.

ADVICE

Design Gems Starting With The README

Start gem design by writing your ideal README to clarify the API and use cases before coding.
Choose abstractions you like and build for yourself so the gem stays modular and practical.

INSIGHT

Keep Evals Close To Production Data

Running evaluations inside your stack solves privacy and audit constraints that external services can't.
Proximity to production data reduces friction for debugging and prompt iteration.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

In this episode of the Ruby AI Podcast, host Valentino Stoll talks with special guest Kieran, a prominent figure in the Ruby AI space. Kieran recently gave a talk at the San Francisco Ruby Meetup about his new gem, Leva, which focuses on LLM evaluations in Ruby. Kieran discusses his background, his passion for AI and Ruby, as well as his journey in building AI products, including his tool Cora, which helps manage email inboxes by categorizing and summarizing emails using AI. Together, Valentino and Kieran explore the process, challenges, and best practices of creating AI-driven gems and tools in Ruby, the importance of evaluations, and the fun and creative aspects of integrating AI into Ruby on Rails projects.

Mentioned in the show:

Kieran Klaassen – Ruby developer, creator of Cora and Leva.
Leva gem – Kieran's LLM evaluation framework for Rails.
Jumpstart Pro – “is the best Ruby on Rails SaaS template out there”.
Stepper / Stepper Motor (workflow engine) – a “journey” with steps for background jobs.
Jaccard Index – A metric for set similarity (|A∩B|/|A∪B|).
LangSmith – a platform for building production-grade LLM applications.
Morph LLM – The Fastest Way to Apply AI Edits (4500+ tokens/sec).
Friday AI Agent – An AI-powered coding agent that handles PRs from start to finish.
DSPy.rb – Framework for building AI agents and optimizing prompts.

Highlights:

00:00 Introduction and Guest Welcome

00:53 Kieran's Background and AI Journey

01:20 Building AI Tools and the Leva Gem

03:47 Challenges and Best Practices in AI Development

07:16 Evaluations and Real-World Applications

07:36 Community Recognition and Adoption

12:37 Prompt Engineering and Model Testing

22:06 Leveraging AI for Workflow Optimization

28:35 Visualizing Workflows and Tools

31:44 Exploring Hybrid Orchestration Layers

33:15 Debating Deterministic Workflows vs. Agent Flows

34:28 The Fun of Experimenting with AI and Ruby

34:55 Building Gems and Learning Through Creation

40:03 The Value of Rails in AI Development

46:28 Evaluating AI Outputs and Metrics

50:40 Annotation and Continuous Improvement

53:50 Future of AI and Rails Integration

54:54 Closing Thoughts and Recommendations