

Strategies For A Successful Data Platform Migration
01:09:53
Lyft's Data Lake Migration
- Gleb Mezhanskiy helped migrate Lyft's data warehouse to a more scalable data lake.
- This two-year project taught him valuable lessons about data migrations.
Building a Data Platform From Scratch
- Gleb's first data platform was built from scratch at Autodesk using cutting-edge technologies like Airflow and Snowflake.
- At Lyft, he faced scalability challenges, leading to the data warehouse migration.
When to Migrate
- Migrate only if your current system cannot support your organization's scale or new technology offers significant improvements.
- Consider factors like data size, complexity, performance, and new use cases.
Get the Snipd Podcast app to discover more snips from this episode
Get the app 1 chevron_right 2 chevron_right 3 chevron_right 4 chevron_right 5 chevron_right 6 chevron_right 7 chevron_right 8 chevron_right 9 chevron_right 10 chevron_right 11 chevron_right 12 chevron_right 13 chevron_right 14 chevron_right 15 chevron_right 16 chevron_right 17 chevron_right 18 chevron_right 19 chevron_right 20 chevron_right
Introduction
00:00 • 2min
When and How to Think About Migrating Your Data Stack
01:34 • 2min
How I Got Started Working in Data
03:47 • 2min
What Constitutes a Data Migration?
05:18 • 4min
The Signals That Migration Is Necessary
08:54 • 2min
The Importance of Scalability in Data Management
11:00 • 5min
How to Measure the Impact of a Cloud Data Warehouse Migration
15:36 • 2min
How to Manage the Cost of a Warehouse Migration
17:43 • 6min
How to Migrate to a New Platform
23:18 • 4min
How to Identify and Prevent Deep Dependencies That Have Grown Organically
27:27 • 5min
The Role of Governance and Access Control in Migration Projects
32:53 • 4min
How to Maximize the Time to Do a Migration
36:59 • 3min
The Importance of User Acceptance Testing in Data Migration
39:48 • 6min
The Importance of Automation in Data Migration
45:35 • 3min
The Importance of Lineage in Data Migration
49:00 • 3min
How to Approach a Data Migration Project
51:36 • 5min
The Importance of Choosing the Right Technology for Your Project
56:54 • 3min
How to Prevent the Need for Migrations
59:44 • 2min
The Cost of Using an Open Source Platform
01:01:32 • 3min
How to Build an Internal Advocacy for Data Platform Migrations
01:04:50 • 5min
Summary
All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for your data platform, making an eventual migration inevitable. In this episode Gleb Mezhanskiy and Rob Goretsky share their experiences leading various data platform migrations, and the hard-won lessons that they learned so that you don't have to.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack
- Modern data teams are using Hex to 10x their data impact. Hex combines a notebook style UI with an interactive report builder. This allows data teams to both dive deep to find insights and then share their work in an easy-to-read format to the whole org. In Hex you can use SQL, Python, R, and no-code visualization together to explore, transform, and model data. Hex also has AI built directly into the workflow to help you generate, edit, explain and document your code. The best data teams in the world such as the ones at Notion, AngelList, and Anthropic use Hex for ad hoc investigations, creating machine learning models, and building operational dashboards for the rest of their company. Hex makes it easy for data analysts and data scientists to collaborate together and produce work that has an impact. Make your data team unstoppable with Hex. Sign up today at dataengineeringpodcast.com/hex to get a 30-day free trial for your team!
- Your host is Tobias Macey and today I'm interviewing Gleb Mezhanskiy and Rob Goretsky about when and how to think about migrating your data stack
Interview
- Introduction
- How did you get involved in the area of data management?
- A migration can be anything from a minor task to a major undertaking. Can you start by describing what constitutes a migration for the purposes of this conversation?
- Is it possible to completely avoid having to invest in a migration?
- What are the signals that point to the need for a migration?
- What are some of the sources of cost that need to be accounted for when considering a migration? (both in terms of doing one, and the costs of not doing one)
- What are some signals that a migration is not the right solution for a perceived problem?
- Once the decision has been made that a migration is necessary, what are the questions that the team should be asking to determine the technologies to move to and the sequencing of execution?
- What are the preceding tasks that should be completed before starting the migration to ensure there is no breakage downstream of the changing component(s)?
- What are some of the ways that a migration effort might fail?
- What are the major pitfalls that teams need to be aware of as they work through a data platform migration?
- What are the opportunities for automation during the migration process?
- What are the most interesting, innovative, or unexpected ways that you have seen teams approach a platform migration?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on data platform migrations?
- What are some ways that the technologies and patterns that we use can be evolved to reduce the cost/impact/need for migraitons?
Contact Info
- Gleb
- Rob
- RobGoretsky on GitHub
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
- Datafold
- Informatica
- Airflow
- Snowflake
- Redshift
- Eventbrite
- Teradata
- BigQuery
- Trino
- EMR == Elastic Map-Reduce
- Shadow IT
- Mode Analytics
- Looker
- Sunk Cost Fallacy
- data-diff
- SQLGlot
- [Dagster](dhttps://dagster.io/)
- dbt
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Hex:  Hex is a collaborative workspace for data science and analytics. A single place for teams to explore, transform, and visualize data into beautiful interactive reports. Use SQL, Python, R, no-code and AI to find and share insights across your organization. Empower everyone in an organization to make an impact with data. Sign up today at [dataengineeringpodcast.com/hex](https://www.dataengineeringpodcast.com/hex} and get 30 days free!
- Rudderstack:  Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack)