Kubernetes at LinkedIn, with Ahmet Alp Balkan and Ronak Nathani
Mar 25, 2025
auto_awesome
Ahmet Alp Balkan and Ronak Nathani are software engineers at LinkedIn, experts in Kubernetes at scale. They share their journey transitioning from custom solutions to adopting Kubernetes for workload management. Key discussions cover flexible capacity management and the importance of user experience in deployment workflows. They also emphasize the role of ‘golden paths’ in development and integrating end-user feedback to avoid pitfalls, including data loss incidents. Their insights reflect both challenges and advances in managing Kubernetes infrastructure effectively.
LinkedIn is actively transitioning various workloads to Kubernetes, utilizing a tailored approach to manage both stateless and stateful applications effectively.
The team emphasizes the importance of a smooth developer experience by abstracting Kubernetes complexities while ensuring continuous delivery through automated processes.
Deep dives
Running Kubernetes at LinkedIn
LinkedIn has transitioned much of its workload to Kubernetes, originally relying on a custom container runtime and scheduler created in-house over a decade ago. This shift was prompted by the maturity and scalability of Kubernetes and the increasing complexity of maintaining their legacy systems. Amit and Ronak emphasize that while not all workloads are currently on Kubernetes, the company is actively working to migrate the majority, making efforts to host stateless, stateful, and batch workloads on the platform. They express a desire to achieve an environment where the past challenges faced from their own custom solutions are minimized and smoother transitions are possible.
Database Management on Kubernetes
Despite common skepticism about running databases on Kubernetes, Amit and Ronak share insights into their successful implementation of this strategy. They rely on a robust setup that gives them control over the entire stack, enabling them to manage local storage instead of network-attached storage for performance reasons. The team has even developed a generic stateful workload operator, allowing various databases to implement a common protocol for operations, thus maintaining flexibility and minimizing disruptions during updates or maintenance. This tailored approach illustrates how, with proper understanding and customization, Kubernetes can effectively support stateful applications.
Challenges of Scaling Kubernetes Clusters
Amit and Ronak discuss the challenges associated with scaling Kubernetes clusters and highlight the importance of handling dependency issues, particularly concerning the etcd database. The current Kubernetes implementation runs on an orchestration legacy stack, with future goals of achieving a self-contained model running on Kubernetes itself. Their exploration includes looking into open-source solutions that could replace etcd, aiming for improved scalability beyond what the traditional setup can offer. The conversation emphasizes the collaborative effort within the open-source community to address scalability and resilience for high-demand environments.
Curating the Developer Experience on Kubernetes
The approach to creating a smooth developer experience at LinkedIn revolves around abstracting Kubernetes complexities while still providing flexibility. Developers at LinkedIn are guided through a user-friendly interface that allows them to specify compute resources, application identifiers, and additional configurations without diving deep into Kubernetes intricacies. The opinionated design also includes automated processes that ensure proper testing and deployments, enabling continuous delivery across different environments. By balancing ease of use with access to Kubernetes functionality, the team aims to empower developers while minimizing the likelihood of mistakes.
Ahmet Alp Balkan and Ronak Nathani are software engineers at LinkedIn compute infrastructure team running the Kubernetes platform for LinkedIn and they joined us today to talk about how they run Kubernetes at scale and what they learned along the way.
Do you have something cool to share? Some questions? Let us know: