The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Chronos: Learning the Language of Time Series with Abdul Fatir Ansari - #685

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00

Quality and Diversity of Tokens in Language Models

Language models use high-quality and diverse tokens in training, with around 84 billion tokens used. This diversity is crucial as each piece of data brings unique information. On the other hand, time series forecasting may have an over-representation of certain types of data, leading to less diverse characteristics. To address this limitation, augmentation schemes are employed to enhance performance.

Play episode from 18:11
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app