Software Misadventures cover image

Software Misadventures

Todd Underwood - On lessons from running ML systems at Google for a decade, what it takes to be a ML SRE, challenges with generalized ML platforms and much more - #10

May 7, 2021
Todd Underwood, Sr Director of Engineering at Google, shares his extensive experience in Site Reliability Engineering for Machine Learning. He discusses how ML systems often fail due to issues unrelated to ML itself, the unique challenges of engineering reliable ML systems, and the crucial skills needed for hiring ML SREs. Todd also emphasizes the importance of empathy in tech during high-pressure scenarios and reflects on the balance between traditional software practices and the demands of ML pipelines, making the case for robust collaboration among teams.
01:07:34

Episode guests

Podcast summary created with Snipd AI

Quick takeaways

  • Collaboration between ML developers and SRE teams is essential to effectively address challenges in maintaining reliable machine learning systems.
  • Feature engineering plays a critical role in model performance, requiring attention to detail to prevent future data-related issues in production.

Deep dives

The Distinction Between ML and Distributed Computing

The discussion emphasizes that while machine learning (ML) is often the primary focus, many tasks are fundamentally about modern distributed computing and effectively managing software on medium-sized collections of computers. Many professionals in the field, including software engineers, systems engineers, and site reliability engineers (SREs), will find ample opportunities in ensuring that ML systems operate smoothly. The speaker encourages newcomers to the data science field to pursue their interest in model building but highlights that there will be significant demand for foundational work around making ML systems function effectively. This indicates a broader scope of responsibilities within the ML ecosystem beyond just model development.

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.
App store bannerPlay store banner

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode