In this engaging interview, Joe Reis, co-author of 'Fundamentals of Data Engineering,' shares his wealth of knowledge in the data engineering space. He discusses the vital role data engineers play in organizations and contrasts it with roles in data science. Joe dives into the dangers of chasing trendy technologies, the importance of mastering foundational principles, and the complexities of data governance in today’s AI-driven world. Listeners will appreciate his insights into resource constraints and the nuances of managing data integrity across various platforms.
59:00
forum Ask episode
web_stories AI Snips
view_agenda Chapters
menu_book Books
auto_awesome Transcript
info_circle Episode notes
question_answer ANECDOTE
Joe Reis's Career Journey
Joe Reis transitioned from a math degree to data science and then data engineering.
The rise of machine learning and cloud computing influenced his career shift.
insights INSIGHT
Fundamentals of Data Engineering
Joe Reis's book "Fundamentals of Data Engineering" focuses on technology-agnostic principles.
This approach makes the book relevant beyond specific tools and their evolution.
insights INSIGHT
Data Engineer's Role
A data engineer manages the data lifecycle, bridging the gap between software engineers and data users (analysts, data scientists).
They transform and serve data for various downstream applications.
Get the Snipd Podcast app to discover more snips from this episode
A Deep Dive into How Distributed Data Systems Work
Alex Petrov
This book guides developers through the essential concepts of modern database and storage engine internals. It explores storage classification and taxonomy, including B-Tree-based and immutable Log Structured storage engines. The book also delves into how database files are organized, using auxiliary data structures such as Page Cache, Buffer Pool, and Write-Ahead Log. Additionally, it covers distributed systems, explaining how nodes and processes connect and build complex communication patterns, and discusses consistency models and how distributed storage systems achieve consistency. The book draws from numerous books, papers, blog posts, and the source code of several open source databases to provide a comprehensive understanding of database internals.
Designing Data-Intensive Applications
The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Martin Kleppman
This book helps software engineers and architects understand the pros and cons of various technologies for storing and processing data. It covers fundamental principles, trade-offs, and design decisions in data systems, including scalability, consistency, reliability, efficiency, and maintainability. The book delves into distributed systems research, the architecture of data systems, and how to make informed decisions about different tools and technologies. It does not provide detailed instructions on specific software packages but focuses on the underlying principles and trade-offs essential for designing data-intensive applications[2][4][5].
Fundamentals of Data Engineering
Joe Reis
Matt Housley
The art of computer programming
Donald Knuth
The Art of Computer Programming is a seminal work by Donald E. Knuth that presents a detailed and systematic treatment of computer programming algorithms. The series, which began in 1962, is planned to consist of seven volumes, with several already published. The books cover a wide range of topics, including fundamental algorithms, seminumerical algorithms, sorting and searching, and combinatorial algorithms. Knuth uses a hypothetical assembly language called MIX (and its RISC version MMIX) to illustrate the algorithms, emphasizing the importance of understanding low-level machine operations. The series is known for its rigorous mathematical approach and detailed analysis of algorithms, making it a cornerstone of computer science literature.
Today, we have Joe Reis on the show. Joe is the co author of the book, Fundamentals of Data Engineering, probably the best and most comprehensive book on data engineering you could think to read.
We talk about the culture of Data Engineering, Relationship with Data Science, the downside of chasing bleeding edge technology in approaches to Data Modeling. Joe's got lots to say, lots of opinions and is super knowledgeable.
So even if Data Engineering, Data Science isn't your thing. We think you're still going to really enjoy listening to the interview.