Neural Search Talks — Zeta Alpha cover image

Generating Training Data with Large Language Models w/ Special Guest Marzieh Fadaee

Neural Search Talks — Zeta Alpha

00:00

Optimizing Data Generation for Re-Ranking Models

This chapter explores the complexities of re-ranking setups, contrasting traditional dense retrieval with BM25 for negative selection. It discusses various prompting strategies and their effects on synthetic query generation, examining the challenges of model training with the MS-Marco dataset. The importance of prompt engineering, domain-specific data, and strategic dataset usage is emphasized to optimize model performance.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app