Kaushik Devarajaiah, Tech Lead for LedgerStore at Uber, dives into the complexities of managing vast amounts of data. He discusses the critical role of LedgerStore in ensuring data integrity and correct transaction records. The conversation highlights the integration of advanced technologies like Spark and Kafka in Uber’s scalable architecture. Kaushik reveals the challenges of performing dual database reads and writes to achieve low latency. He also shares future innovations planned for LedgerStore, emphasizing improving data visualization and consistency.
Read more
AI Summary
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Kaushik Devarajaiah emphasizes the critical importance of immutable storage solutions like LedgerStore in ensuring accuracy and integrity for Uber's financial transactions.
The challenges faced during the migration of 250 billion records highlight the intricate processes required to maintain data consistency and reliability at scale.
Deep dives
Scaling Data Infrastructure at Uber
Kaushik Devarojay discusses his journey at Uber, emphasizing the challenges of scaling the data infrastructure during the company's rapid growth. Initially focusing on large-scale data ingestion, he encountered unique issues due to the mutable nature of the data, such as real-time trip data that needed to be analyzed for decision-making purposes. This required building systems from the ground up, where he also contributed to the development of Apache Hootie, which facilitates snapshotting the states of data. The transition into storage infrastructure presented new challenges, particularly regarding availability and latency, which were crucial factors for Uber's operational success.
Challenges in Data Management and Scaling
As Uber's data infrastructure matured, maintaining schema compliance became a significant challenge due to evolving data structures generated by various teams. Devarojay describes how he addressed these inconsistencies by developing a formalized process for schema management, utilizing tools like Avro schemas to ensure data integrity. The rapid growth of data further complicated decision-making about batch processing versus real-time availability, necessitating continuous optimization of Spark jobs and other technologies to handle the increased load. These challenges revealed the importance of having a robust and consistent data strategy to support Uber’s business operations.
Introduction of Ledger Store
The discussion introduces Ledger Store, a specialized database designed for maintaining immutability for time-sensitive transactions at Uber. This structure is vital for maintaining accurate financial records and audit trails, ensuring that each transaction can be verified against its history. Devarojay highlights that the introduction of Ledger Store was crucial because it supported billions of financial transactions, providing guarantees around data correctness and completeness. Furthermore, the system’s design allows for the association of timestamps with every entry, reinforcing its integrity and reliability in handling high volumes of data effects.
Migration to Enhanced Storage Solutions
Devarojay elaborates on the complexities involved in migrating 250 billion records to Ledger Store without downtime or data loss. The migration process was meticulously designed to ensure historical data backfilling preceded the initiation of online writes, maintaining synchronized data integrity. Techniques like dual writes were implemented, where one set of data was directed to the old system while the new system caught up, highlighting the careful orchestration necessary to manage the transition. The emphasis on strong consistency during this migration reinforced Uber's commitment to maintaining accuracy and reliability in their data handling processes.
Uber handles billions of trips and deliveries, and tens of billions of financial transactions across drivers, couriers, users, and merchants every quarter.
LedgerStore is an immutable storage solution at Uber that provides verifiable data completeness and correctness guarantees to ensure data integrity for its transactions.
Kaushik Devarajaiah is the Tech Lead for LedgerStore at Uber. He joins the show to talk about scaling Uber’s data and storage infrastructure.
Sean’s been an academic, startup founder, and Googler. He has published works covering a wide range of topics from information visualization to quantum computing. Currently, Sean is Head of Marketing and Developer Relations at Skyflow and host of the podcast Partially Redacted, a podcast about privacy and security engineering. You can connect with Sean on Twitter @seanfalconer.