undefined

Todd Underwood

Sr Director of Engineering at Google, leading Site Reliability Engineering teams for Machine Learning. His expertise lies in building reliable ML systems and hiring ML SREs.

Best podcasts with Todd Underwood

Ranked by the Snipd community
undefined
16 snips
Oct 7, 2022 • 1h 2min

Reliable ML // Niall Murphy & Todd Underwood // Coffee Sessions #127

MLOps Coffee Sessions #127 with Niall Murphy & Todd Underwood, Reliable ML co-hosted by David Aponte. // Abstract By applying an SRE mindset to machine learning, authors and engineering professionals Cathy Chen, Kranti Parisa, Niall Richard Murphy, D. Sculley, Todd Underwood, and featured guest authors show you how to run an efficient and reliable ML system. Whether you want to increase revenue, optimize decision-making, solve problems, or understand and influence customer behavior, you'll learn how to perform day-to-day ML tasks while keeping the bigger picture in mind. (Book description from O'Reilly) MLOps Coffee Sessions #127 with Niall Murphy & Todd Underwood, Reliable ML co-hosted by David Aponte.   // Abstract By applying an SRE mindset to machine learning, authors and engineering professionals Cathy Chen, Kranti Parisa, Niall Richard Murphy, D. Sculley, Todd Underwood, and featured guest authors show you how to run an efficient and reliable ML system. Whether you want to increase revenue, optimize decision-making, solve problems, or understand and influence customer behavior, you'll learn how to perform day-to-day ML tasks while keeping the bigger picture in mind. (Book description from O'Reilly)  It was great that they wrote this book in the first place in a space that's new and lots of people are entering it with a lot of questions and this book clarifies those questions. It was also great to have all of their experiences documented in this one book and there's a lot of value in putting them all in one place so that people can benefit from it. // Bio Niall Murphy Niall has been interested in Internet infrastructure since the mid-1990s. He has worked with all of the major cloud providers from their Dublin, Ireland offices - most recently at Microsoft, where he was the global head of Azure Site Reliability Engineering (SRE). His books have sold approximately a quarter of a million copies worldwide, most notably the award-winning Site Reliability Engineering, and he is probably one of the few people in the world to hold degrees in Computer Science, Mathematics, and Poetry Studies. He lives in Dublin, Ireland, with his wife and two children. Todd Underwood Todd is a Director at Google and leads Machine Learning for Site Reliability Engineering Director. He is also the Site Lead for Google’s Pittsburgh office. ML SRE teams build and scale internal and external ML services and are critical to almost every Product Area at Google.     Before working at Google, Todd held a variety of roles at Renesys.  He was in charge of operations, security, and peering for Renesys’s Internet intelligence services which are now part of Oracle's Cloud service. He also did product work for some early social products that Renesys worked on. Before that Todd was the Chief Technology Officer of Oso Grande, an independent Internet service provider (AS2901) in New Mexico. // MLOps Jobs board   https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Reliable Machine Learning book by Cathy Chen, Niall Richard Murphy, Kranti Parisa, D. Sculley, Todd Underwood: https://www.oreilly.com/library/view/reliable-machine-learning/9781098106218/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Niall on LinkedIn: https://www.linkedin.com/in/niallm/ Connect with Todd on LinkedIn: https://www.linkedin.com/in/toddunder/
undefined
11 snips
May 7, 2021 • 1h 8min

Todd Underwood - On lessons from running ML systems at Google for a decade, what it takes to be a ML SRE, challenges with generalized ML platforms and much more - #10

Todd Underwood, Sr Director of Engineering at Google, shares his extensive experience in Site Reliability Engineering for Machine Learning. He discusses how ML systems often fail due to issues unrelated to ML itself, the unique challenges of engineering reliable ML systems, and the crucial skills needed for hiring ML SREs. Todd also emphasizes the importance of empathy in tech during high-pressure scenarios and reflects on the balance between traditional software practices and the demands of ML pipelines, making the case for robust collaboration among teams.