

161: Greybeards talk AWS S3 storage with Andy Warfield, VP Distinguished Engineer, Amazon
Jan 19, 2024
In this episode, Andy Warfield, VP Distinguished Engineer at Amazon and expert in data storage, discusses the evolution and advancements of AWS S3. He sheds light on S3 Express and One Zone storage, which promise lower response times. Andy dives into the role of S3 in supporting generative AI and the complexities of file versus object storage. With insights from his teaching background, he explains the importance of durability in data storage and highlights innovations that enhance operator experience and efficiency in various industries.
AI Snips
Chapters
Transcript
Episode notes
Amazon Open Data's Valuable Datasets
- Amazon Open Data hosts enormous free curated data sets including genomics and AI data.
- The Common Crawl data set, used for large language model training, is heavily accessed through this service.
Data Lakes and Open Table Formats
- The data lake pattern makes storage accessible to any tool, separating engine and storage.
- Open table formats like Iceberg build table abstractions over Parquet files to enable schema evolution and mutability.
S3 Durability Design Principles
- S3 is designed for 11 nines of durability and to survive loss of a single facility.
- AWS monitors failures closely and tunes repair systems to maintain this very high durability.