The Use of Partition Summaries in Data Engineering

Traditional anomaly detection techniques, they kind of compare like a current partition to some aggregate of historical partitions. But we find that if we just save data, so we call it partition summaries in the paper, then we just create a summary for each partition. So instead of our feeding the whole thing and comparing today to full aggregate, we maintain those individual summaries. It's super cheap, simple, scalable, works very well. And I think there's a lot of value from this.

Play episode from 12:11

Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app