Mario Žagar, Distinguished Engineer at Infobip, discusses the evolution of engineering at Infobit over the past 15 years and shares insights on architecting for scale. Topics include the early days of Infobip, progressive rollouts, scaling teams, open sourcing Kafka topic management, and engineering challenges in scalability and stability.
Infobip evolved from running a monolith on a single server to operating a hybrid cloud containerized infrastructure with thousands of databases, tackling challenges like manual deployments and system stability.
Infobip successfully addressed scalability by extracting independent functionalities into separate services, implementing custom RPC library and service registry, and focusing on automation and canary deployments for smooth rollouts.
Deep dives
Evolution of Engineering at Infobip
Infobip started with a small market in Croatia but realized they could do business worldwide, resulting in massive revenue growth. The company initially ran a monolith on a single server, but now operates a hybrid cloud containerized infrastructure with thousands of databases. Challenges included manual deployments, dependency management, and system stability. As they scaled, Infobip extracted independent functionalities into separate services, improving development speed, deployment, and isolation. They developed their own RPC library using JSON over HTTP for inter-service communication. Infobip also created their own service registry and implemented canary deployments to ensure reliability. Their biggest engineering challenge revolved around designing systems for multi-data center setups and architecting for failures at various layers of the infrastructure.
Infrastructure Scalability and Deployment
Infobip faced challenges in scaling their infrastructure and streamlining deployments. They dealt with issues related to manual deployments, dependency management, and infrastructure stability. To address scalability, they extracted independent functionalities from their monolithic application into separate services. They used JSON over HTTP for inter-service communication and developed their own RPC library. Infobip implemented a service registry and client-side load balancing to handle service discovery. They focused on automation and canary deployments to ensure smooth rollouts. Balancing on-premises and cloud deployments helped them scale efficiently and optimize costs.
Managing Data and Database Scale
Infobip adopted various approaches to managing data and database scale. They started with a single, shared database in the early days but later moved towards dedicated databases for each service. Separate databases decreased the risk of one service affecting others. Infobip used ClickHouse for aggregated reporting and utilized custom solutions for data ingestion and processing in Kafka. They adopted a service registry to manage Kafka topics and built an application, available as open source, for managing Kafka topics at scale. While considering open-source solutions such as gRPC and Netflix's Eureka, they ultimately built their own RPC library and service registry to meet their specific needs.
Challenges of Architecture and Failover
Infobip faced architectural challenges related to handling failures and designing for graceful degradation. They focused on ensuring system stability and handling multiple data center setups. They opted for an active-active approach to guarantee high availability. Challenges included architecting for failures at different infrastructure layers and managing network connectivity issues. Infobip constantly emphasized maintaining system stability and adapting the architecture as per evolving requirements. Challenges extended beyond application and database levels to factors like router failures and high-speed connection issues. They implemented a multi-layered approach to handle failures and continue delivering at scale.
In this episode, we spoke with Mario Žagar, a Distinguished Engineer at Infobip. Infobip is a tech unicorn based out of Croatia that is a global leader in omnichannel communication, bootstrapping its way to a staggering $1B+ in revenue.
We discussed the super early days of engineering at Infobit when they were running a monolith on a single server to today running a hybrid cloud containerized infrastructure with thousands of databases serving billions of requests. It's a really fascinating look and deep dive into the evolution of engineering over the past 15 years and the challenges of essentially architecting for scale.