
Is ChatGPT Getting Worse? with James Zou - #645
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Using Social Media to Curate Large Data Sets for Medical Language Models
In the quest for good annotated data sets for language models, researchers discovered that Twitter can be a valuable resource. Many doctors and clinicians use the platform to seek opinions and analysis from their colleagues by sharing medical images. By curating and filtering through hashtags and employing NLP techniques, several hundreds of thousands of high quality Twitter threads with images and text descriptions were obtained. This novel approach proved useful in obtaining larger data sets for building language models in specific domains like medicine.
00:00
Transcript
Play full episode
Remember Everything You Learn from Podcasts
Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.