Streaming Data Into The Lakehouse With Iceberg And Trino At Going
Nov 18, 2024
auto_awesome
Ken Pickering, VP of Engineering at Going, leads a data platform team focused on finding the best travel deals. He discusses the complexities of streaming data into a Trino and Iceberg lakehouse, sharing his experience in managing vast flight datasets. Ken elaborates on their dual approach to search strategies—passive and active—and the technologies like Confluent and Databricks that support their operations. He highlights collaboration within the engineering teams and the challenges of maintaining data quality and governance in a rapidly evolving landscape.
Ken emphasized the importance of an open lakehouse architecture for flexibility and scalability in managing vast data volumes effectively.
The dual passive and active search strategies utilized by Going enhance their ability to provide timely and relevant travel deal recommendations to consumers.
Deep dives
Data Monitoring for Integrity and Anomaly Detection
Effective data management is crucial for preventing issues before they escalate into larger problems. DataFold's monitoring system allows for automatic detection of discrepancies and anomalies across various databases, ensuring data integrity is maintained in real time. This capability is particularly valuable in maintaining operational efficiency and reducing the risk of costly errors. By catching data issues right at the source, teams can fortify their data stacks and streamline decision-making processes.
Leveraging Data for Travel Deal Optimization
In the travel industry, data is increasingly recognized as a critical asset for discovering and presenting appealing deals to consumers. Companies like Going utilize large volumes of flight information, employing complex models to determine competitive pricing and personalized recommendations. The duality of passive and active data search allows them to innovate on pricing strategies while managing extensive data volumes. As a result, their services become more relevant and timely for users seeking optimal travel opportunities.
Real-Time Streaming Architecture for Data Flows
The architecture designed for managing data streams is essential for timely decision-making in fast-paced environments like travel. Using technologies like Kafka and Trino means data can be processed and analyzed as it arrives, allowing companies to act swiftly on pricing changes. This streaming architecture also ensures that consumer-facing applications can deliver updated information instantly, improving user experience. By prioritizing real-time data flows, organizations can capitalize on fleeting opportunities within the market.
Collaboration and Cross-Functional Teams in Data Management
Building a cohesive engineering team is fundamental to successfully managing a data-centric business. At Going, close collaboration among data engineers, data scientists, and product managers enables the agile development of models and tools that improve customer satisfaction. This cross-functional approach fosters a culture of shared responsibility and enhances the overall credibility of the data-driven recommendations offered to consumers. As such, the integration of data insights into business strategies becomes a collaborative effort that promotes growth and innovation.
In this episode, I had the pleasure of speaking with Ken Pickering, VP of Engineering at Going, about the intricacies of streaming data into a Trino and Iceberg lakehouse. Ken shared his journey from product engineering to becoming deeply involved in data-centric roles, highlighting his experiences in ecommerce and InsurTech. At Going, Ken leads the data platform team, focusing on finding travel deals for consumers, a task that involves handling massive volumes of flight data and event stream information.
Ken explained the dual approach of passive and active search strategies used by Going to manage the vast data landscape. Passive search involves aggregating data from global distribution systems, while active search is more transactional, querying specific flight prices. This approach helps Going sift through approximately 50 petabytes of data annually to identify the best travel deals.
We delved into the technical architecture supporting these operations, including the use of Confluent for data streaming, Starburst Galaxy for transformation, and Databricks for modeling. Ken emphasized the importance of an open lakehouse architecture, which allows for flexibility and scalability as the business grows.
Ken also discussed the composition of Going's engineering and data teams, highlighting the collaborative nature of their work and the reliance on vendor tooling to streamline operations. He shared insights into the challenges and strategies of managing data life cycles, ensuring data quality, and maintaining uptime for consumer-facing applications.
Throughout our conversation, Ken provided a glimpse into the future of Going's data architecture, including potential expansions into other travel modes and the integration of large language models for enhanced customer interaction. This episode offers a comprehensive look at the complexities and innovations in building a data-driven travel advisory service.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode