

Large-Scale Entity Resolution - Sonal Goyal
Oct 28, 2022
53:27
We talked about:
- Sonal’s background
- How the idea for Zingg came about
- What Zingg is
- The difference between entity resolution and identity resolution
- How duplicate detection relates to entity resolution
- How Sonal decided to start working on Zingg
- How Zingg works
- What Zingg runs on
- Switching from consultancy to working on a new open source solution
- Why Zingg is open source
- Open source licensing
- Working on Zingg initially vs now
- Zingg’s current and future team
- Sonal’s biggest current challenge
- Avoiding problems with entity/identity resolution through database design
- Identity resolution vs basic joins, data fusions, and fuzzy joins
- Deterministic matching vs probabilistic machine learning
- Identity and entity resolution applications for fraud detection
- Graph algorithms vs classic ML in entity resolution
- Identity resolution success stories
- What Sonal would do differently given the chance to start over with Zingg
- Advice for those seeking to realize their own solution to a data problem
- Reading suggestion from Sonal
- Conclusion
Links:
- Open-Source Spotlight demo "Zingg":https://www.youtube.com/watch?v=zOabyZxN9b0
- Creative Selection: Inside Apple's Design Process During the Golden Age of Steve Jobs book: https://www.amazon.com/Creative-Selection-Inside-Apples-Process/dp/1250194466
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html