179: Time Series Data Management and Data Modeling with Tony Wang of Stanford University
Feb 28, 2024
auto_awesome
Stanford University PhD student, Tony Wang, discusses his research focus on time series data management. Topics include challenges in academia and industry, academic lab structure, decision to move from hardware to data research, data modeling in time series, issues and potential solutions for parquet format, and the role of external indices in parquet files.
Transition from hardware to data research emphasized practical applications over theoretical research.
Innovative approach to time series data management proposed solutions to enhance data indexing and storage efficiency.
Deep dives
Tony Wang's Background and PhD Research Focus
Tony Wang, a PhD student at Stanford University, discusses his background and research focus on data processing systems. He delves into his transition from hardware to data systems research, emphasizing the importance of practical applications over theoretical research. Wang's work centers on optimizing data processing in data lakes like Apache Iceberg and Delta Lake to efficiently analyze large-scale data.
Challenges in Pursuing a PhD and Industry Impact
Wang navigates the challenges of pursuing a PhD, drawing from his experiences at Stanford and insights gained from industry engagements. He highlights the disconnect between academia and industry expectations, shedding light on the practical considerations essential for bridging research with real-world applications. Wang's journey underscores the significance of industry relevance in shaping academic pursuits and research outcomes.
Innovative Approach to Time Series Data Management
Wang's innovative approach to time series data management reflects a paradigm shift in the data processing landscape. By proposing solutions to enhance data indexing and storage efficiency in time series analysis, Wang introduces novel methodologies to streamline data retrieval and processing. His work demonstrates a fusion of traditional data formats like Parquet with modern indexing techniques to address the evolving demands of time-sensitive data applications.
Future Directions and Start-up Aspirations
Looking ahead, Wang envisions exploring entrepreneurial ventures post his PhD journey to translate his research into impactful solutions for the industry. With a keen interest in observability tools and emerging technologies, Wang aims to contribute to the evolving data ecosystem through innovative start-up initiatives. His strategic focus on practical applications and industry relevance underscores his commitment to driving technological advancements and entrepreneurial endeavors.
The decision to move from hardware to data research (24:43)
Research focus on time series data management (27:40)
Data modeling in time series and OLAP systems (32:01)
Issues and potential solutions for parquet format (37:32)
Role of external indices in parquet files (42:19)
Tony's open source project (47:11)
Final thoughts and takeaways (49:30)
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode