Data Engineering Podcast cover image

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-diff

Data Engineering Podcast

00:00

Data Diff Comparing Data Sets

Many companies that we spoke to have a lot of problems with insuring the quality of those data eplication processes. And so we thought, itsn't it really interesting case? And that a lined nicely with the concept of data diff comparing data sets. So we started iterating and rebuilt the first nave implementation. That pretty much required us using a third party system. We used spark at the time. It would go into both systems, down load data sets, compare it in memory and then provide the sult. But that turn out to be very brittle and very expensive.

Transcript
Play full episode

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app