
Information Theory for Language Models: Jack Morris
Latent Space: The AI Engineer Podcast
00:00
Exploring Gemma 3n and AI Model Dynamics
This chapter discusses the launch of the Gemma 3n language model, focusing on its ability to merge multiple modalities and the role of modular adapters in enhancing functionality. It examines the potential of smaller, parameter-efficient models to achieve high performance through innovative training and architecture, alongside the complexities of information storage and utilization in language models. Additionally, the chapter highlights recent research on model architecture, the Morse constant's impact on performance, and challenges in translating model weights into training data.
Transcript
Play full episode