AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
The Use of Partition Summaries in Data Engineering
Traditional anomaly detection techniques, they kind of compare like a current partition to some aggregate of historical partitions. But we find that if we just save data, so we call it partition summaries in the paper, then we just create a summary for each partition. So instead of our feeding the whole thing and comparing today to full aggregate, we maintain those individual summaries. It's super cheap, simple, scalable, works very well. And I think there's a lot of value from this.