Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Book •

Author

Vasil Khalidov

This paper presents a method for automating data curation for self-supervised learning by using successive and hierarchical k-means clustering.

The approach aims to create datasets that are large, diverse, and balanced, which can outperform uncurated data and match or surpass manually curated data in performance.

The technique has been tested across various data domains, including web-based images, satellite images, and text.

Mentioned by

Mentioned in 1 episodes

Mentioned in relation to research on automatic data curation for self-supervised learning.

33 snips

#170 - new Sora rival, OpenAI robotics, understanding GPT4, AGI by 2027?

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app