AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
How Much Data Did You Use?
Why mess with tokens at all yeah it's a good question I mean there have been efforts in this direction or like back in the days there were like character RNNs. Thomas mentioned 800 gigabytes what does that actually translate to in terms of like how much of the internet did you grab for this? "I think 45 Wow so this is a huge collection of languages and it includes like low resource African languages and things like that"