Grey Beards on Systems

161: Greybeards talk AWS S3 storage with Andy Warfield, VP Distinguished Engineer, Amazon

Jan 19, 2024
In this episode, Andy Warfield, VP Distinguished Engineer at Amazon and expert in data storage, discusses the evolution and advancements of AWS S3. He sheds light on S3 Express and One Zone storage, which promise lower response times. Andy dives into the role of S3 in supporting generative AI and the complexities of file versus object storage. With insights from his teaching background, he explains the importance of durability in data storage and highlights innovations that enhance operator experience and efficiency in various industries.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Amazon Open Data's Valuable Datasets

  • Amazon Open Data hosts enormous free curated data sets including genomics and AI data.
  • The Common Crawl data set, used for large language model training, is heavily accessed through this service.
INSIGHT

Data Lakes and Open Table Formats

  • The data lake pattern makes storage accessible to any tool, separating engine and storage.
  • Open table formats like Iceberg build table abstractions over Parquet files to enable schema evolution and mutability.
INSIGHT

S3 Durability Design Principles

  • S3 is designed for 11 nines of durability and to survive loss of a single facility.
  • AWS monitors failures closely and tunes repair systems to maintain this very high durability.
Get the Snipd Podcast app to discover more snips from this episode
Get the app