Dive into the thrilling rise of a new social media platform as it blossoms from a stagnant base to a staggering 30 million users. Hear insights from a software engineer on how they managed explosive growth without breaking the bank. Discover the intricacies of the Personal Data Server and the innovative strategies for incident management during challenging periods. The conversation is packed with anecdotes about tech misadventures, video streaming hurdles, and the playful side of building resilient infrastructure in the digital age.
Bluesky's rapid growth to nearly 30 million users necessitated a transition from cloud-hosted infrastructure to on-prem solutions for enhanced performance.
The optimization of the AT protocol involved significant re-architecting efforts, allowing the team to achieve exponential growth without proportional costs.
Future infrastructure planning at Bluesky emphasizes proactive strategies for anticipated user increases, mitigating performance spikes and supporting reliability.
Deep dives
The Hidden Costs of Cloud Infrastructure
Cloud infrastructure offers convenience but often comes with unexpected hidden costs. Users may not anticipate that the bulk of their expenses will emerge from egress fees and markup costs instead of straightforward service charges based on instance power. This hidden pricing structure can surprise those who anticipate simplicity and predictability in their cloud billing. Recognizing these costs allows organizations to better strategize their cloud usage and consider a balance between on-prem solutions and cloud services.
Transition to On-Prem Infrastructure
The company transitioned from cloud-hosted infrastructure to on-prem solutions in response to scalability challenges. Initially operating under a cloud model, the rapid growth in user numbers quickly overwhelmed the existing framework. The migration involved a complete redesign of the backend to accommodate the anticipated user growth, significantly improving performance and stability. The organization successfully implemented this transition with minimal disruption, ultimately enhancing reliability across the platform.
User Growth and Scaling Challenges
The platform has experienced rapid growth, with user numbers escalating from 100,000 to nearly 30 million in under two years. This hypergrowth can create significant pressure on infrastructure, leading to performance spikes and system overloads. Events, such as unexpected influxes of users from specific regions, have challenged existing capabilities, necessitating quick adaptations and solutions. The management of scaling during these turbulent times exemplifies the need for robust monitoring and clear responses to evolving user demands.
Infrastructure Insights and Future Planning
Insights from managing a rapidly evolving infrastructure stress the importance of planning for future expansion based on projections rather than historical data. The company implemented a proactive approach to its infrastructure capabilities, constructing facilities designed for significant user increases before they occurred. This foresight allows for smoother operations, with less downtime and fewer disruptions as the user base continues to expand. Moving forward, the company will focus on enhancing service discovery and efficiency within its infrastructure.
Innovations in Data Management
The ongoing commitment to improving data management systems reflects the need for advanced infrastructure solutions. This includes the potential development of customized databases optimized for specific workloads, such as managing user timelines more efficiently. Additionally, caching strategies and request management processes are being explored to enhance performance and user experience. The architecture's evolution includes building a more distributed system to further mitigate issues and ensure continued reliability for users.
Bluesky has been on a roller coaster of growth for over a year. From the early days of figuring out a new distributed social protocol—AT protocol—to actually building it and inviting 30 million of their closest friends. Not only has the site gone through tremendous growth, the team has been optimizing, re-architecting, and adding features the entire time.
Jaz is a software engineer focused on the infrastructure at Bluesky, and they share how they achieved exponential growth without exponential costs. We cover some of the key components of the protocol and how that affects the architecture.
There’s some amazing advice from the trenches we know you’ll enjoy.
Show Highlights (0:00) Intro (5:00) Jaz’s background (12:30) Bluesky Infrastructure (17:00) Predicting the future (20:00) What is a PDS? (22:30) Relay and firehose (26:00) Work queues (30:00) Scaling physical servers (37:00) How do you handle incidents? (41:00) Where’s Kubernetes? (43:30) How video changes (45:00) Data locality (46:30) Hardware decisions (53:00) What bad decisions? (57:00) Launching video (1:00:00) What’s next?
About Jaz
Jaz is a software engineer who learned from on-the-job experience. They have a background with hardware which makes them better with software. If they’re not drinking Monster they’re building a single purpose database, or maybe they’re doing both. Jaz went from building with AT protocol to building AT protocol in a matter of months. They also have an impressive collection of plushies and power tools.