MLOps.community

Real-time Feature Generation at Lyft // Rakesh Kumar // #334

Jul 25, 2025
Rakesh Kumar, a Senior Staff Software Engineer at Lyft with a focus on Machine Learning platforms, dives into the intricacies of real-time feature generation. He discusses how Lyft evolved from naive pipelines to handling millions of events per minute, achieving low-latency delivery. Rakesh emphasizes balancing self-service and specialized data processing while navigating the challenges of geospatial data. He also shares insights on technology adoption and how YAML configurations streamline data processing efforts. This session is a treasure trove for anyone interested in MLOps and real-time data management!
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Evolution from Cron Jobs to Streaming

  • Lyft evolved from a cron job based pipeline to streaming processing using Apache Beam and Flink for real-time feature generation.
  • They addressed scalability issues by sharding data on geohashes instead of cities to evenly distribute load and avoid hot shards.
ANECDOTE

Real-Time vs Offline Feature Validation

  • Lyft compares real-time features with offline 'ground truth' features and alerts if discrepancies exceed thresholds.
  • This observability framework ensures real-time features maintain high data quality and reliability.
INSIGHT

Hierarchical Geospatial Feature Store

  • Lyft uses a geospatial hierarchical feature store that supports aggregated features across various geohash levels.
  • This flexible store allows different models to consume data at multiple regional granularities through a unified API.
Get the Snipd Podcast app to discover more snips from this episode
Get the app