

Generating Training Data with Large Language Models w/ Special Guest Marzieh Fadaee
Dec 13, 2022
Marzieh Fadaee, an NLP Research Lead at Zeta Alpha, discusses her innovative work on using large language models like GPT-3 to generate domain-specific training data. The conversation dives into her papers, 'InPars' and 'Promptagator,' highlighting methods for high-quality data augmentation with minimal human intervention. Fadaee explores the challenges of leveraging LMs in information retrieval, the intricacies of prompt engineering, and the potential pitfalls of synthetic data. Her insights pave the way for future research in optimizing neural retrieval systems.
Chapters
Transcript
Episode notes
1 2 3 4 5 6 7
Intro
00:00 • 5min
Enhancing Machine Learning with Synthetic Data
04:55 • 9min
Optimizing Data Generation for Re-Ranking Models
13:46 • 29min
Exploring Query Intent and Synthetic Data Generation
42:57 • 12min
Evaluating Retrieval Methodologies
55:02 • 5min
Exploring Query Intent and Task Specialization in Information Retrieval
01:00:15 • 2min
Enhancing Retrieval through Consistency Filtering
01:02:30 • 14min