

Simon Karasik
A proactive and curious ML Engineer with 5 years of experience, developed & deployed ML models at WEB and Big scale for Ads and Tax.
Best podcasts with Simon Karasik
Ranked by the Snipd community

8 snips
Apr 30, 2024 • 56min
Handling Multi-Terabyte LLM Checkpoints // Simon Karasik // #228
Simon Karasik, an experienced ML Engineer, discusses handling multi-terabyte LLM checkpoints. Topics include managing massive models, cloud storage options, comparing Slurm and Kubernetes, navigating data processing challenges, monitoring Kubernetes nodes with faulty GPUs, and simplifying model training processes.