Data Engineering Podcast cover image

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-diff

Data Engineering Podcast

00:00

Data Diff Comparing Data Sets

Many companies that we spoke to have a lot of problems with insuring the quality of those data eplication processes. And so we thought, itsn't it really interesting case? And that a lined nicely with the concept of data diff comparing data sets. So we started iterating and rebuilt the first nave implementation. That pretty much required us using a third party system. We used spark at the time. It would go into both systems, down load data sets, compare it in memory and then provide the sult. But that turn out to be very brittle and very expensive.

Transcript
Play full episode

Remember Everything You Learn from Podcasts

Save insights instantly, chat with episodes, and build lasting knowledge - all powered by AI.
App store bannerPlay store banner