Explore why data scientists should know data engineering with Dan Sullivan, a software architect and data scientist. Learn about the advantages, challenges, and transitions in AI, MLOps, and cloud platforms. Discover the intersections of data roles, data warehouses, and data lakes in efficient data processing. Enhance data science efficiency and modeling through iterative feedback and skills in data engineering.
Read more
AI Summary
AI Chapters
Episode notes
auto_awesome
Podcast summary created with Snipd AI
Quick takeaways
Data scientists benefit from learning data engineering skills for handling data at scale and improving productivity.
Understanding the differences between data warehouses and data lakes optimizes data storage strategies.
Balancing optimization and 'good enough' solutions in data projects enables iterative improvements and efficient development.
Deep dives
Overview of Dan Sullivan's Background
Dan Sullivan is a software architect and data scientist with extensive experience in big data, machine learning, data architecture, security, stream processing, and cloud architecture. He has authored multiple LinkedIn learning courses and holds three PhDs in genetics, bioinformatics, and computational biology. Dan's prolific expertise brings a unique perspective to the discussion.
Advantages of Knowing Data Engineering
Dan highlights the benefits of understanding data engineering for data scientists and ML engineers. Data engineering skills equip individuals to handle data at scale, leverage command line utilities, and utilize tools like Cloud Dataflow for efficient data cleaning, manipulation, and exploration. This knowledge enhances productivity and scalability in data-related projects.
Key Differences Between Data Warehouse and Data Lake
Dan explains the distinctions between data warehouse and data lake concepts. Data warehouses are structured repositories designed for analytics and decision-making, whereas data lakes store raw, unstructured data for potential future use. Understanding the purposes and functionalities of each helps optimize data storage and processing strategies.
Balancing Optimization and 'Good Enough'
Dan emphasizes the importance of achieving a balance between optimization and 'good enough' solutions in data projects. Prioritizing early deployment of functional models for feedback allows for iterative improvements based on user needs and practical outcomes. Incremental progress and frequent feedback loops support efficient and effective data project development.
Data Security and Compliance Considerations
Dan underscores the significance of integrating security and compliance considerations from the outset of data projects. Collaboration with information security professionals and adherence to organizational policies and regulations are paramount. Minimizing sensitive data exposure and leveraging tools for data analysis ensures data handling aligns with security and compliance standards.
MLOps meetup #12 // What are the advantages for a data scientist to know data engineering?
What good is learning Data Engineering skills? These days full stack is overflowing with all the different things you need to know about so why learn data Engineering now? Our guest on this meetup will make the case for what the advantages are if you do decide to learn data engineering and also go into depth on how to do data engineering in the cloud.
Dan Sullivan is a software architect and data scientist with extensive experience in big data, machine learning, data architecture, security, stream processing, and cloud architecture. Dan is the author of the official Google Cloud study guides for the Professional Architect, Professional Data Engineer, and Associate Cloud Engineer exam guides as well as NoSQL for Mere Mortals.
He is also the author of over ten LinkedIn Learning courses on data science, machine learning, SQL, data architecture, and NoSQL. He holds a Ph.D. in genetics, bioinformatics, and computational biology.
Get a copy of his new book here: https://www.wiley.com/en-us/Official+Google+Cloud+Certified+Professional+Data+Engineer+Study+Guide-p-9781119618454
Join our slack community: https://join.slack.com/t/mlops-community/shared_invite/zt-391hcpnl-aSwNf_X5RyYSh40MiRe9Lw
Follow us on twitter:@mlopscommunity
Sign up for the next meetup: https://zoom.us/webinar/register/WN_a_nuYR1xT86TGIB2wp9B1g
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Dan Sullivan on LinkedIn: https://www.linkedin.com/in/dansullivanpdx/
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode