The Data Engineering Show

A Technical Deep Dive to Yelp's Data Infrastructure - With Steven Moy

4 snips
May 11, 2021
Steven Moy, a Software Engineer at Yelp with expertise in query engines, shares insights into the company's massive data infrastructure evolution. He discusses the shift from traditional systems to Amazon Redshift and the challenges posed by data silos. Moy highlights innovative projects that harness user-generated content through collaboration and machine learning. Moreover, he delves into modern architectures like data lakes, balancing cost management with efficient data access, and examines the competitive landscape in cloud computing innovations.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Yelp's Early Data Warehouse Shift

  • Yelp initially used a MySQL analytics replica that slowed replication under heavy analytic query load.
  • They transitioned to Amazon Redshift around 2013, drastically improving query performance from hours to minutes.
INSIGHT

ML Highlights Popular Dishes

  • Yelp uses machine learning to identify popular dishes by combining photo contributions and reviews.
  • This approach enhances user experience by showcasing popular dishes through high-quality photos and reviews.
ANECDOTE

Data Lake Usability Failure

  • Steven Moy's team built a scalable data lake but overlooked customer usability, forcing users to adopt complex partition key queries.
  • This caused frustration, making it one of his greatest failures despite initial success.
Get the Snipd Podcast app to discover more snips from this episode
Get the app