The podcast discusses the use of Kubernetes and Amazon EKS for managing data and ML workloads. They introduce the Data on EKS project as a solution and explore the evolution of Kubernetes, adoption and migration to Kubernetes with AWS, and the use of blueprints and the 'Data on EKS' project for Amazon EKS.
Kubernetes is becoming a popular choice for modernizing data applications, like batch processing and ML, due to factors like open source support or specific requirements.
The Data on EKS project allows organizations to leverage Kubernetes for data and ML workloads, offering flexibility with various infrastructure as code tools and providing blueprints and examples for analytics and AI frameworks.
Deep dives
Data on EKS Project: Building Data and ML Workloads on Kubernetes
The podcast episode discusses the Data on EKS project, which focuses on running data and machine learning workloads on Kubernetes. The episode features two experts, Vara and Alex, who provide insights into the project and its benefits. One key point highlighted is that while AWS offers various managed services for data and ML workloads, some customers choose Kubernetes due to factors like open source support or specific requirements. AWS offers Amazon EKS as a managed Kubernetes offering to help customers build and manage Kubernetes applications, and the Data on EKS project provides blueprints and examples for running key data and ML workloads on EKS. The episode emphasizes that the project aims to address scalability and storage challenges commonly encountered by customers, along with providing best practices and benchmarks to facilitate a smooth migration to EKS.
Adopting Kubernetes and Utilizing the Data on EKS Project
The podcast episode explores the growing popularity of using Kubernetes for data and ML workloads. Initially designed for stateless microservices, Kubernetes has evolved to support stateful applications with features like custom resource definitions and GPU support. The Data on EKS project, an open-source venture, allows organizations to leverage Kubernetes for data and ML workloads. The episode highlights the flexibility of the project, which accommodates various infrastructure as code tools, including Terraform and CDK. Additionally, the project offers blueprints and examples for analytics and AI workloads, covering frameworks like Spark, Flink, and PyTorch. The episode emphasizes the increasing adoption of Kubernetes for data and ML workloads and mentions the support provided by AWS's SageMaker team.
Considerations for Adopting Data on EKS and Migrating Workloads
The podcast episode addresses common challenges and considerations when adopting the Data on EKS project and migrating workloads to Kubernetes. Scalability and network issues are identified as major challenges, along with storage concerns for data and ML workloads. The episode highlights that the project's blueprints address these challenges by providing solutions for IP exhaustion and guidance on storage volumes like EFS and EBS. Advice is given for organizations considering the adoption of Kubernetes, emphasizing the ability to mix and match different compute options based on specific use cases. The episode also suggests leveraging existing Kubernetes platforms and multi-tenancy capabilities to securely accommodate multiple teams. Finally, the episode encourages users to explore the Data on EKS project's blueprints and seek assistance from AWS solution architects for a successful migration process.
Organizations use their data to make better decisions and build innovative experiences for their customers. With the exponential growth in data, and the rapid pace of innovation in machine learning (ML), there is a growing need to build modern data applications that are agile and scalable. In this episode, Jillian is joined by Vara Bonthu, Principal Solutions Architect, and Alex Lines, Sr. Containers Specialist, to talk through why Kubernetes is becoming a popular choice for modernizing data applications, like batch processing and ML. They also discuss how AWS's open-source project, Data on EKS, helps customers build and test common use cases, like batch processing with Apache Spark or training an ML language model, to decrease the time it takes to get to production.
Data on EKS website: https://awslabs.github.io/data-on-eks/
Data on EKS GitHub repository: https://github.com/awslabs/data-on-eks
Data on EKS blog: https://go.aws/46bD1b8
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode