The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) cover image

Scaling TensorFlow at LinkedIn with Jonathan Hung - #314

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

00:00

Navigating TensorFlow on Clusters

This chapter explores the evaluation of TensorFlow training options on clusters, particularly TensorFlow on Spark, discussing its benefits and limitations regarding fault tolerance and GPU support. It highlights the strategic motivations at LinkedIn for leveraging Hadoop ecosystems to enhance machine learning workflows and addresses the challenges faced by AI engineers in managing shared resources. The chapter concludes with insights into the development of an internal library, Tony, aimed at improving distributed TensorFlow training and job management within their infrastructure.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app