Exploring Offline and Online Data Processing in Real-Time Feature Generation

This chapter examines the differences between offline and online data processing, especially in real-time feature generation. It discusses the roles of offline data in validation and model backtesting while addressing the complications that arise from delays in real-time data due to connectivity issues.

Transcript

chevron_right

Play full episode

chevron_right

Transcript

Episode notes

Real-time Feature Generation at Lyft // MLOps Podcast #334 with Rakesh Kumar, Senior Staff Software Engineer at Lyft.

Join the Community: https://go.mlops.community/YTJoinIn

Get the newsletter: https://go.mlops.community/YTNewsletter

// Abstract

This session delves into real-time feature generation at Lyft. Real-time feature generation is critical for Lyft where accurate up-to-the-minute marketplace data is paramount for optimal operational efficiency. We will explore how the infrastructure handles the immense challenge of processing tens of millions of events per minute to generate features that truly reflect current marketplace conditions.

Lyft has built this massive infrastructure over time, evolving from a humble start and a naive pipeline. Through lessons learned and iterative improvements, Lyft has made several trade-offs to achieve low-latency, real-time feature delivery. MLOps plays a critical role in managing the lifecycle of these real-time feature pipelines, including monitoring and deployment. We will discuss the practicalities of building and maintaining high-throughput, low-latency real-time feature generation systems that power Lyft’s dynamic marketplace and business-critical products.

// Bio

Rakesh Kumar is a Senior Staff Software Engineer at Lyft, specializing in building and scaling Machine Learning platforms. Rakesh has expertise in MLOps, including real-time feature generation, experimentation platforms, and deploying ML models at scale. He is passionate about sharing his knowledge and fostering a culture of innovation. This is evident in his contributions to the tech community through blog posts, conference presentations, and reviewing technical publications.

// Related Links

Website: https://englife101.io/

https://eng.lyft.com/search?q=rakesh

https://eng.lyft.com/real-time-spatial-temporal-forecasting-lyft-fa90b3f3ec24

https://eng.lyft.com/evolution-of-streaming-pipelines-in-lyfts-marketplace-74295eaf1eba

Streaming Ecosystem Complexities and Cost Management // Rohit Agrawal // MLOps Podcast #302 - https://youtu.be/0axFbQwHEh8

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]

MLOps Swag/Merch: [https://shop.mlops.community/]

Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Rakesh on LinkedIn: /rakeshkumar1007/

Timestamps:

[00:00] Rakesh preferred coffee

[00:24] Real-time machine learning

[04:51] Latency tricks explanation

[09:28] Real-time problem evolution

[15:51] Config management complexity

[18:57] Data contract implementation

[23:36] Feature store

[28:23] Offline vs online workflows

[31:02] Decision-making in tech shifts

[36:54] Cost evaluation frequency

[40:48] Model feature discussion

[49:09] Hot shard tricks

[55:05] Pipeline feature bundling

[57:38] Wrap up

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books