Data Engineering Podcast cover image

Building Auditable Spark Pipelines At Capital One

Data Engineering Podcast

00:00

The Volume, Variety and Velocity of Parquet Files

In terms of the kind of volume of data, you mentioned that you're dealing with a lot of parquet files. Om, assuming that youre primarily sourcing and depositing information in s three, i'm wondering what you are looking at in terms of the sort of three vs of volume, variety and velocity. Ca: It ranges from probably 25 to 50 million daily. That's the volum process. And then that volume goes through variety of watt flows,. So it ranges from a simple use case, where, if were familiar the caplon cards, the simple case tere customers wives using one of our cashbare card, which is quicksilver.

Play episode from 06:57
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app