
Gleb Mezhanskiy
CEO and co-founder of Datafold, specializing in automating data engineering workflows with AI. Previously a data engineer at companies like Autodesk and Lyft.
Top 5 podcasts with Gleb Mezhanskiy
Ranked by the Snipd community

83 snips
Feb 26, 2025 • 60min
The Future of Data Engineering: AI, LLMs, and Automation
Gleb Mezhanskiy, CEO and co-founder of Datafold, shares insights from his journey in data engineering and the integration of AI. He discusses how large language models can streamline code writing, improve data accessibility, and facilitate testing and code reviews. Mezhanskiy emphasizes the challenges at the intersection of AI and data workflows, advocating for continuous adaptation. With practical applications like text-to-SQL and enhanced data observability, he paints an optimistic picture for the future of data engineering.

47 snips
Jun 11, 2023 • 48min
Build Better Tests For Your dbt Projects With Datafold And data-diff
Summary
Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Your data can be stable and wrong, but then it isn't reliable. Confidence in your data is achieved through constant validation and testing. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management
RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack
Your host is Tobias Macey and today I'm interviewing Gleb Mezhanskiy about how to test your dbt projects with Datafold
Interview
Introduction
How did you get involved in the area of data management?
Can you describe what Datafold is and what's new since we last spoke? (July 2021 and July 2022 about data-diff)
What are the roadblocks to data testing/validation that you see teams run into most often?
How does the tooling used contribute to/help address those roadblocks?
What are some of the error conditions/failure modes that data-diff can help identify in a dbt project?
What are some examples of tests that need to be implemented by the engineer?
In your experience working with data teams, what typically constitutes the "staging area" for a dbt project? (e.g. separate warehouse, namespaced tables, snowflake data copies, lakefs, etc.)
Given a dbt project that is well tested and has data-diff as part of the validation suite, what are the challenges that teams face in managing the feedback cycle of running those tests?
In application development there is the idea of the "testing pyramid", consisting of unit tests, integration tests, system tests, etc. What are the parallels to that in data projects?
What are the limitations of the data ecosystem that make testing a bigger challenge than it might otherwise be?
Beyond test execution, what are the other aspects of data health that need to be included in the development and deployment workflow of dbt projects? (e.g. freshness, time to delivery, etc.)
What are the most interesting, innovative, or unexpected ways that you have seen Datafold and/or data-diff used for testing dbt projects?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on dbt testing internally or with your customers?
When is Datafold/data-diff the wrong choice for dbt projects?
What do you have planned for the future of Datafold?
Contact Info
LinkedIn
Closing Announcements
Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Parting Question
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
Datafold
Podcast Episode
data-diff
Podcast Episode
dbt
Dagster
dbt-cloud slim CI
GitHub Actions
Jenkins
Circle CI
Dolt
Malloy
LakeFS
Planetscale
Snowflake Zero Copy Cloning
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SASpecial Guest: Gleb Mezhanskiy.Sponsored By:Rudderstack: 
RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.
RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team.
RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again.
Visit [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack) to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.Support Data Engineering Podcast

19 snips
Oct 27, 2024 • 49min
Accelerate Migration Of Your Data Warehouse with Datafold's AI Powered Migration Agent
Gleb Mezhanskiy, CEO and co-founder of DataFold, shares his extensive experience in data management from his time at Autodesk and Lyft. He dives into the complexities of data migrations, detailing challenges like technical debt and the need for effective parity between systems. Gleb reveals how DataFold leverages AI to automate data migration processes, significantly reducing time and effort. He also discusses the importance of monitoring data integrity in real-time and offers insights into choosing the right models for secure data handling.

14 snips
Jul 31, 2023 • 1h 10min
Strategies For A Successful Data Platform Migration
Summary
All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for your data platform, making an eventual migration inevitable. In this episode Gleb Mezhanskiy and Rob Goretsky share their experiences leading various data platform migrations, and the hard-won lessons that they learned so that you don't have to.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management
Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack
Modern data teams are using Hex to 10x their data impact. Hex combines a notebook style UI with an interactive report builder. This allows data teams to both dive deep to find insights and then share their work in an easy-to-read format to the whole org. In Hex you can use SQL, Python, R, and no-code visualization together to explore, transform, and model data. Hex also has AI built directly into the workflow to help you generate, edit, explain and document your code. The best data teams in the world such as the ones at Notion, AngelList, and Anthropic use Hex for ad hoc investigations, creating machine learning models, and building operational dashboards for the rest of their company. Hex makes it easy for data analysts and data scientists to collaborate together and produce work that has an impact. Make your data team unstoppable with Hex. Sign up today at dataengineeringpodcast.com/hex to get a 30-day free trial for your team!
Your host is Tobias Macey and today I'm interviewing Gleb Mezhanskiy and Rob Goretsky about when and how to think about migrating your data stack
Interview
Introduction
How did you get involved in the area of data management?
A migration can be anything from a minor task to a major undertaking. Can you start by describing what constitutes a migration for the purposes of this conversation?
Is it possible to completely avoid having to invest in a migration?
What are the signals that point to the need for a migration?
What are some of the sources of cost that need to be accounted for when considering a migration? (both in terms of doing one, and the costs of not doing one)
What are some signals that a migration is not the right solution for a perceived problem?
Once the decision has been made that a migration is necessary, what are the questions that the team should be asking to determine the technologies to move to and the sequencing of execution?
What are the preceding tasks that should be completed before starting the migration to ensure there is no breakage downstream of the changing component(s)?
What are some of the ways that a migration effort might fail?
What are the major pitfalls that teams need to be aware of as they work through a data platform migration?
What are the opportunities for automation during the migration process?
What are the most interesting, innovative, or unexpected ways that you have seen teams approach a platform migration?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on data platform migrations?
What are some ways that the technologies and patterns that we use can be evolved to reduce the cost/impact/need for migraitons?
Contact Info
Gleb
LinkedIn
@glebmm on Twitter
Rob
LinkedIn
RobGoretsky on GitHub
Parting Question
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
Datafold
Podcast Episode
Informatica
Airflow
Snowflake
Podcast Episode
Redshift
Eventbrite
Teradata
BigQuery
Trino
EMR == Elastic Map-Reduce
Shadow IT
Podcast Episode
Mode Analytics
Looker
Sunk Cost Fallacy
data-diff
Podcast Episode
SQLGlot
[Dagster](dhttps://dagster.io/)
dbt
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SASponsored By:Hex: 
Hex is a collaborative workspace for data science and analytics. A single place for teams to explore, transform, and visualize data into beautiful interactive reports. Use SQL, Python, R, no-code and AI to find and share insights across your organization. Empower everyone in an organization to make an impact with data. Sign up today at [dataengineeringpodcast.com/hex](https://www.dataengineeringpodcast.com/hex} and get 30 days free!Rudderstack: 
Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack)Support Data Engineering Podcast

Mar 17, 2024 • 58min
Reconciling The Data In Your Databases With Datafold
The podcast delves into data reconciliation in databases, discussing error conditions and solutions to ensure data accuracy. Topics include challenges in data management, techniques for maintaining data quality, navigating reconciliation in warehouse migration projects, and strategies for cost management and data optimization. The innovative uses of Datafold and Data Diff utility in various sectors, intersection of data engineering and AI applications, and advancements in tooling support for data engineers are also explored.