The 404 Media Podcast

Your Bluesky Posts Are Probably Training AI

26 snips
Dec 4, 2024
Concerns about data privacy heat up as users react to the revelation of a dataset containing their Bluesky posts. The ethical implications of data scraping in social media are scrutinized, raising questions about consent and user protection. Meanwhile, a nostalgic look at Redbox explores its rise and fall amid the streaming era, along with efforts to recycle and repurpose old kiosks. The conversation also highlights new government actions against data brokers, aiming to curb privacy violations and enhance digital rights.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

Dataset Outrage

  • A machine learning librarian published a dataset of 1 million Blue Sky posts, including user IDs and content.
  • This caused outrage among users, leading to comparisons to serious offenses and demands for consequences.
INSIGHT

Dataset Proliferation

  • The dataset's virality led to others creating larger datasets, some with up to 300 million posts, as a form of trolling.
  • This highlights a tension between those who oppose scraping and those who see it as inevitable.
INSIGHT

Openness and Scraping

  • Blue Sky's open, decentralized nature makes it vulnerable to data scraping, unlike platforms with paid APIs.
  • This openness is a double-edged sword, offering user ownership but also facilitating unwanted data collection.
Get the Snipd Podcast app to discover more snips from this episode
Get the app