
Screaming in the Cloud Engineering Around Extreme S3 Scale with R. Tyler Croy
18 snips
Jan 13, 2026 R. Tyler Croy, an infrastructure architect at Scribd and veteran open-source developer, discusses the staggering costs associated with managing billions of S3 objects. He reveals how normal assumptions break down under extreme scale and why engineering solutions are essential. Tyler emphasizes innovative data strategies, like packing files into Parquet, to minimize object counts and reduce expenses. He also explores how AI is transforming old documents into valuable assets, driving new storage priorities in a rapidly evolving tech landscape.
AI Snips
Chapters
Books
Transcript
Episode notes
Object Count Breaks Conventional S3 Economics
- Scribd accumulated hundreds of billions of S3 objects from format conversions and accessibility variants over 18 years.
- At that scale, per-object features and operations become astronomically expensive and change engineering assumptions.
Checksums Became A $100K Problem
- Many old Scribd objects lacked checksums because they predated automatic SDK behavior.
- Computing checksums via S3 batch would cost about $100K for 100 billion objects, making simple maintenance costly.
AI Revives Long-Tail Document Value
- Large language models made decades-old user documents suddenly far more valuable and queryable.
- This flips access patterns from rare long-tail reads to broad horizontal access for data science and LLM use cases.


