undefined

David Berenstein

Hugging Face team member pioneering synthetic data generation, contributing to projects like DissiLabel and Persona Hub.

Top 3 podcasts with David Berenstein

Ranked by the Snipd community
undefined
13 snips
Mar 25, 2025 • 1h 2min

Synthetic Data with David Berenstein and Ben Burtenshaw - Weaviate Podcast #118!

David Berenstein and Ben Burtenshaw from Hugging Face dive into the fascinating world of synthetic data generation. They discuss innovative methodologies like persona-driven data and integration tactics for enhancing quality and diversity. The duo highlights the importance of tools like DistilLabel and Argilla for smooth data augmentation and model fine-tuning. Excitingly, they explore the potential for synthetic image data and its impact on AI education, emphasizing accessibility and user-friendly solutions in AI's future.
undefined
5 snips
Oct 9, 2024 • 57min

Towards high-quality (maybe synthetic) datasets

David Berenstein, a developer advocate engineer at Hugging Face, and Ben Burtenshaw, a machine learning engineer at Argilla, dive into the crucial realm of data quality in AI. They discuss how collaboration between domain experts and data scientists significantly enhances model efficacy. The conversation covers innovative strategies for generating synthetic datasets, utilizing AI for labeling, and maintaining privacy. The duo also shares insights on the importance of effective feedback loops and multimodal data integration for refining AI training.
undefined
Oct 9, 2024 • 57min

Towards high-quality (maybe synthetic) datasets (Practical AI #290)

Ben Burtenshaw is a machine learning engineer at Argilla, focused on data collaboration tools, while David Berenstein is a developer advocate engineer at Hugging Face, enhancing data quality for AI. They discuss the critical role of data collaboration in AI, the iterative process of dataset curation, and the partnership between AI engineers and domain experts. The conversation also explores synthetic data generation, AI feedback mechanisms, and the innovative use of multimodal datasets, including practical applications in healthcare to improve model training.