AI Chat: ChatGPT, AI News, Artificial Intelligence, OpenAI, Machine Learning cover image

The Surprising Websites Included in ChatGPT's Training Data

AI Chat: ChatGPT, AI News, Artificial Intelligence, OpenAI, Machine Learning

00:00

Google's C4 Dataset: A Massive Snapshot of Content From 15 Million Different Websites

Chat GPT is based on data from 15 million different websites that were used to construct some of the most high profile English language AIs. The opening eye doesn't necessarily disclose what dataset they're using to train the models backing Chat GPT. But we can assume it's some pretty similar things and you'll see why in a little bit. One of the biggest content sections in this entire dataset is just in business in general. They made up about 16% of the entire dataset.

Play episode from 01:46
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app