Tricks to Fine Tuning // Prithviraj Ammanabrolu // #318

Jun 11, 2025

In a captivating discussion, Prithviraj Ammanabrolu, an Assistant Professor at UC San Diego and Research Scientist at Databricks, dives deep into the innovative Tao fine-tuning method. This technique allows for training models without labeled data, using reinforcement learning and synthetic inputs. The conversation explores how Tao can enhance small models, optimize limited datasets, and fine-tune outputs effectively. Prithviraj highlights strategies to balance performance, adaptability, and efficiency in machine learning, positioning these advancements as game-changers for model training.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

TAO Fine Tuning Without Labels

TAO fine tuning allows customizing models without needing labeled data by leveraging prompts and synthetic data generated by the model itself. - It uses reinforcement learning where the model learns from its own mistakes, which differs from traditional supervised learning relying on human labels.

INSIGHT

Reward Model Eases Fine-Tuning

TAO uses a reward model that scores outputs to guide reinforcement learning, making judging outputs easier than generating them. - This reward model acts like sparse annotated data, providing feedback with scalar scores instead of dense human-labeled data.

INSIGHT

Using Test Time Compute in Training

TAO uses extra inference compute during training by generating and scoring multiple responses per prompt to improve model quality. - This 'test time' compute is spent at training, so deployed models run efficiently with no additional overhead.

Get the Snipd Podcast app to discover more snips from this episode

Get the app