Learnings from building Open Source Distributed Systems with Kishore Gopalakrishna
Aug 27, 2024
auto_awesome
Kishore Gopalakrishna, Co-founder and CEO of StarTree and co-author of Apache Pinot, shares his wealth of knowledge in real-time analytics and distributed systems. He reveals the challenges and innovations involved in building systems like Apache Pinot and discusses the pivotal role of community in open-source success. Kishore also delves into effective cost optimizations and the transition from local to cloud storage, emphasizing how real-time analytics can transform data-driven decision-making in businesses.
Kishore Gopalakrishna highlights the complexities of building distributed systems, emphasizing the distinct challenges between transactional and analytical frameworks.
The episode reveals innovative testing methodologies that enhance the reliability and robustness of systems like Apache Pinot and Espresso.
Kishore discusses the importance of fostering a collaborative community in open-source projects to drive innovation and improve user experience.
Deep dives
Kishore Gopalakrishna's Background and Role in Real-Time Analytics
Kishore Gopalakrishna is a prominent figure in the real-time analytics and streaming industry, serving as the co-founder and CEO of StarTree, where he focuses on delivering Apache Pinot as a service. His experience includes significant contributions to several innovative projects, such as Apache Pinot, Espresso, and ThirdEye, all of which were aimed at addressing various challenges in data storage and processing. At LinkedIn, he gained hands-on experience building distributed systems that cater to massive data analytics, leveraging lessons learned from projects in Yahoo. This rich background informs his approach to developing technologies that improve real-time data insights for users.
The Challenge of Building Distributed Systems
Kishore discusses the complexities of building distributed systems, emphasizing the distinction between transactional and analytical systems. He highlights the importance of understanding the unique requirements of each type, such as the high read throughput in transactional systems versus the complexity of queries in analytical systems. During his time at LinkedIn, he recognized the need for innovation to unify these disparate worlds into an efficient system. Apache Pinot emerged from the realization that there was significant business value in providing real-time analytical capabilities to a large user base, enabling timely decisions based on immediate data access.
The Art of System Design and Implementation
Kishore outlines the intricacies involved in system design, emphasizing that the transition from an idea to a fully functional system requires careful planning and consideration of both current and future needs. He stresses the importance of designing systems to be flexible and robust enough to adapt to evolving technological landscapes and new user requirements over time. Insight gained from building earlier systems like Hadoop and Zookeeper informed Pinot’s development, guiding him to anticipate potential pitfalls. System architects must account for numerous unknowns and be prepared to pivot as technologies mature and new challenges arise.
Testing Methodologies for Distributed Systems
Throughout the conversation, Kishore reveals the innovative testing methodologies developed during the creation of systems like Espresso and Pinot. He advocates for a robust testing strategy that not only addresses known issues but also anticipates unknown challenges through dynamic test cases. By implementing a testing approach that autonomously generates new queries and tests each combination, the team ensured reliability and robustness among various systems and circumstances. Such rigorous testing practices can significantly improve the quality and resilience of distributed systems, reducing the chances of building a product that falters under pressure.
The Importance of Community in Open Source Projects
Kishore emphasizes that fostering a collaborative community is crucial for the success of open-source projects like Apache Pinot. He shares insights on building a respectful, engaging community where contributions are welcomed and varying opinions are valued. By creating a supportive environment, the community can evolve organically, driving innovation and improvement while ensuring that the software continues to meet diverse user needs. This approach not only enriches the ecosystem but also helps users transition into contributors, enhancing the capability of the project over time.
In this episode of The Geek Narrator podcast, hosted by Kaivalya Apte, we welcome a special guest, Kishore Gopalakrishna from StarTree, co-author of Apache Pinot and other notable projects. Kishore shares his extensive experience in building real-time analytics and streaming systems, including Apache Pino, Espresso, Apache Helix, and Third Eye. The episode delves into the motivations and challenges behind creating these systems, the innovations they brought to distributed systems, and the impact of community on open-source projects. Kishore also discusses the evolution of testing methodologies, cost optimizations in transactional and analytical systems, and key considerations for companies evaluating real-time analytics solutions.
Don't miss this in-depth conversation packed with valuable insights for both seasoned developers and tech enthusiasts!
Chapters:
00:00 Introduction
03:13 Building Distributed Systems at LinkedIn
08:57 Testing and Challenges in Distributed Systems
30:50 Advantages of Columnar Storage
33:04 The Importance of Upserts
34:24 Building a Strong Open Source Community
41:10 Challenges and Lessons in System Design
51:35 Real-Time Analytics: Do You Need It?
StarTree: https://startree.ai/
Apache Pinot: https://pinot.apache.org/
If you like this episode, please hit the like button and share it with your network.
Also please subscribe if you haven't yet.
Database internals series: https://youtu.be/yV_Zp0Mi3xs
Popular playlists:
Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA-
Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17
Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d
Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN
Stay Curios! Keep Learning!
#distributedsystems #kafka #s3 #streaming #realtimeanalytics #database #pinot #startree
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode