Training Hyena DNA on the Human Genome and Evo on wider batch | 2min snip from The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Long Context Language Models and their Biological Applications with Eric Nguyen - #690

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

NOTE

Training Hyena DNA on the Human Genome and Evo on wider batch

The model is initially trained on the human genome consisting of 3 billion nucleotides using a simple vocabulary of just four bases. The training process involves taking arbitrarily long sequences and performing next token or nucleotide prediction. The challenge lies in scaling this process due to the sheer amount of data involved. However, the focus is shifting towards training a larger DNA foundation model by incorporating millions of genomes and species through Evo. DNA serves as a valuable source of biological data with trillions of tokens available for training across different species.

00:00

Transcript

Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.