AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Using Social Media to Curate Large Data Sets for Medical Language Models
In the quest for good annotated data sets for language models, researchers discovered that Twitter can be a valuable resource. Many doctors and clinicians use the platform to seek opinions and analysis from their colleagues by sharing medical images. By curating and filtering through hashtags and employing NLP techniques, several hundreds of thousands of high quality Twitter threads with images and text descriptions were obtained. This novel approach proved useful in obtaining larger data sets for building language models in specific domains like medicine.