How AI Is Built  cover image

#5 Shahul Es and Jithin James on Building Reliable LLM Applications, Production-Ready RAG, Data-Driven Evals

How AI Is Built

NOTE

Approach to Synthetic Data Generation and Model Training (study)

The synthetic data generation process involves two types of LMS: one generates questions and answers while the other critiques and provides feedback to improve the quality of the generated questions. Initially using GPT-4 as the critique model was expensive, so the focus shifted to developing a smaller, more cost-effective model for feedback. The goal is to reduce costs and enable the generation of more data points. To enhance the distribution and diversity in synthetic data, an algorithm was developed that evolves questions into different types requiring varied levels of reasoning, inspired by research findings. The aim is to connect the infrastructure to production data to derive inspiration from it and generate questions similar to real-world data for better evaluation.

00:00
Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner