Sammy Steele, Senior Staff Engineer at Figma, shares insights on scaling databases, including challenges faced in infrastructure growth, horizontally sharding databases to minimize developer impact, navigating scalability challenges, implementing transactions in a sharded system, and managing large tables to avoid database failures.
Read more
AI Summary
Highlights
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Transitioning from petabyte scale to infinite headroom at Figma required tackling fundamental scalability issues.
Tactical approach included classifying transactions by risk level, shadow frameworks, and gradual sharding transition.
Utilizing shadow setups, offline analysis, and strategic planning ensured successful scaling and sharding at Figma.
Deep dives
Challenges in Scaling and Sharding Databases
Scaling databases to accommodate exponential growth poses challenges in maintaining reliability, consistency, and scalability. Sammy Steele discusses the daunting task of transitioning from a petabyte scale setup to one with infinite headroom at Figma. The team grappled with managing the business's exponential growth while investing time in solving fundamental scalability issues. They navigated the complexities of cross-shard transactions, prioritizing tasks based on risk levels, and strategizing to ensure minimal developer impact while addressing critical database challenges.
Tactical Approaches & Sharding Strategies
Sammy Steele details the tactical approach adopted by the team in tackling database scaling at Figma. By classifying transactions based on risk levels and focusing on high-risk scenarios, they mitigated the impact of failed operations. The team leveraged shadow frameworks and query planners to assess the readiness of tables for horizontal sharding, emphasizing a gradual transition to sharded setups. Approaches included simulating sharded environments through shadow reads and writes and employing tactical solutions to balance short-term stability with long-term scalability.
The execution of queries at Figma traverses various layers, from application services to database proxies to the underlying databases. The team utilized shadow setups and conducted offline analysis to evaluate the implications of sharding specific tables. By establishing views to mimic sharded environments and testing real traffic against prepared shards, they ensured the viability of sharding strategies before full deployment. Strategic planning, risk assessment, and meticulous execution underpinned Figma's successful journey in scaling and sharding database systems.
Implementing DB Proxy for Query Routing and Optimization
The podcast episode discusses how the implementation of an intermediate layer called DB proxy helps manage communication between the application layer and multiple databases. By using a grpc client to interact with DB proxy, the service functions as a query engine that optimizes query parsing, identifies shard keys, and routes data to the appropriate databases. This approach improves coordination and reduces manual configuration efforts, enhancing query performance and data routing efficiency.
Challenges and Solutions in Horizontal Sharding Implementation
The episode highlights the challenges faced during horizontal sharding operations, such as maintaining shard key configurations and handling shard splits. To address these issues, a service named topology was developed to manage physical and logical database topologies, including colos for related tables. Additionally, the importance of read replicas for load sharing and resilience is emphasized to ensure uninterrupted service during primary database failures.
Sammy Steele is a Senior Staff Engineer at Figma, and the tech lead for their databases team. She previously worked at Dropbox, where she built out their petabyte-scale metadata storage and search systems.
Sammy recently published a blog called “How Figma’s databases team lived to tell the scale”. The blog went viral and made it to the top of Hacker News. We invited Sammy on the podcast to learn more, and she is our guest today.
Sean’s been an academic, startup founder, and Googler. He has published works covering a wide range of topics from information visualization to quantum computing. Currently, Sean is Head of Marketing and Developer Relations at Skyflow and host of the podcast Partially Redacted, a podcast about privacy and security engineering. You can connect with Sean on Twitter @seanfalconer .