Building large-scale models requires a strategic approach that begins with a manageable context window before expanding to larger sizes. Starting with a smaller context window of 8,000 tokens allows for foundational learning before advancing to an extensive 128,000 tokens. This incremental strategy likely reflects broader lessons learned by leading companies in the field, highlighting the importance of a phased training process.
Our 176th episode with a summary and discussion of last week's big AI news!
NOTE: apologies for this episode coming out about a week late, things got in the way of editing it...
With hosts Andrey Kurenkov (https://twitter.com/andrey_kurenkov) and Jeremie Harris (https://twitter.com/jeremiecharris)
Read out our text newsletter and comment on the podcast at https://lastweekin.ai/
If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form.
Email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai
- (00:00:00) Intro Song
- (00:00:34) Intro Banter
- Tools & Apps
- Projects & Open Source
- Applications & Business
- Research & Advancements
- Policy & Safety
- Synthetic Media & Art
- (01:23:03) Outro
- (01:23:58) AI Song