
Data Engineering Podcast
A Look At The Data Systems Behind The Gameplay For League Of Legends
Podcast summary created with Snipd AI
Quick takeaways
- The data engineering team at Riot Games overcomes challenges of working with legacy systems and resource constraints to build machine learning models and improve player matchmaking.
- Onboarding new team members and reducing the learning curve remain ongoing challenges for the team, and they aim to develop tools and libraries to simplify the process.
- Ensuring data quality and consistency is crucial for effective analysis and decision-making, and the team emphasizes the importance of testing, monitoring, and collaboration between data scientists and game developers.
Deep dives
The Challenges of Building and Maintaining Data-Related Products for Online Video Games
Ian Schwier, an engineer at Riot Games, discusses the role of the Data Central team in building and maintaining data-related products for video games, specifically the League of Legends game engine. The team collects and analyzes data from players, game servers, and microservices, which is then ingested into a Hive data warehouse via Kafka and S3. They leverage Delta tables, Databricks, and airflow for data processing and orchestration. The team focuses on both traditional data analysis and machine learning, using the data to drive decision-making in areas such as matchmaking, player behavior, and game recommendations.
Building Data Systems in the Context of a Legacy Game
Being a game that has been around since 2008, League of Legends presents unique challenges. The data engineering team has to navigate legacy systems and migrations to ensure data consistency and quality. They have to reconcile new technology with old business practices and address compatibility issues. The game server plays a central role in data collection, providing accurate and reliable game-related data. The team leverages this single artifact to build machine learning models, monitor player behavior, and improve matchmaking. They also face challenges in delivering ML models in a binary format that fits within resource constraints.
Onboarding Challenges and Tools for Game Data Analysis
The team faces the common challenge of onboarding new team members and reducing the learning curve. They leverage vendor tools like elation and Monte Carlo to provide a catalog of collected data and to ensure data quality. However, onboarding remains an ongoing challenge, and they aim to develop tools and libraries to simplify onboarding processes and reduce friction. They also emphasize the importance of collaboration between data scientists and game developers to understand the data requirements and build customized experiences.
The Importance of Testing and Establishing Data Trustworthiness
Ensuring the quality and consistency of data is critical for effective analysis and decision-making. The team highlights the significance of testing and monitoring to identify anomalies or missing data. They leverage unit tests, integration tests, and work closely with the game analysis team to validate data accuracy. Ongoing experimentation and version control are crucial, as data issues can have long-term impacts, including player dissatisfaction and reputational damage.
Exploring Live Inference and Generalizing ML Systems
The team looks ahead to advancements in live inference within the game engine to provide personalized recommendations and experiences to players. They aim to expand and generalize their tools for game developers across various League of Legends games. The focus is on building more flexible and scalable ML systems and platforms that can be customized for different game environments and player needs.
Summary
The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal business users accessing an environment controlled by the business. In this episode Ian Schweer shares his experiences at Riot Games supporting player-focused features such as machine learning models and recommeder systems that are deployed as part of the game binary. He explains the constraints that he and his team are faced with and the various challenges that they have overcome to build useful data products on top of a legacy platform where they don’t control the end-to-end systems.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos.
- The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with an automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your database/data warehouse/data lakehouse/whatever you’re using and let them do the rest. Go to dataengineeringpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan.
- Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to dataengineeringpodcast.com/hevodata and sign up for a free 14-day trial that also comes with 24×7 support.
- Your host is Tobias Macey and today I’m interviewing Ian Schweer about building the data systems that power League of Legends
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you describe what League of Legends is and the role that data plays in the experience?
- What are the characteristics of the data that you are working with? (e.g. volume/variety/velocity, structured vs. unstructured, real-time vs. batch, etc.)
- What are the biggest data-related challenges that you face (technically or organizationally)?
- Multiplayer games are very sensitive to latency. How does that influence your approach to instrumentation/data collection in the end-user experience?
- Can you describe the current architecture of your data platform?
- What are the notable evolutions that it has gone through over the life of the game/product?
- What are the capabilities that you are optimizing for in your platform architecture?
- Given the longevity of the League of Legends product, what are the practices and design elements that you rely on to help onboard new team members?
- What are the seams that you intentionally build in to allow for evolution of components and use cases?
- What are the most interesting, innovative, or unexpected ways that you have seen data and its derivatives used by Riot Games or your players?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on the data stack for League of Legends?
- What are the most interesting or informative mistakes that you have made (personally or as a team)?
- What do you have planned for the future of the data stack at Riot Games?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
- Riot Games
- League of Legends
- Team Fight Tactics
- Wild Rift
- DoorDash
- Decision Science
- Kafka
- Alation
- Airflow
- Spark
- Monte Carlo
- libtorch
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Hevo:  Are you sick of repetitive, time-consuming ELT work? Step off the hamster wheel and opt for an automated data pipeline like Hevo. Hevo is a reliable and intuitive data pipeline platform that enables near real-time data movement from 150+ disparate sources to the destination of your choice. Hevo lets you set up pipelines in minutes, and its fault-tolerant architecture ensures no fire-fighting on your end. The pipelines are purpose-built to be ‘set and forget,’ ensuring zero coding or maintenance to keep data flowing 24×7. All it takes is 3 steps for your pipeline to be up and running. Moreover, transparent pricing and 24×7 live tech support ensure 24×7 peace of mind for you. Don’t waste another minute on unreliable data pipelines or painstaking manual maintenance. Sprint your way towards near real-time data integration with a pipeline that is easy to set up and even easier to control. Head over to [dataengineeringpodcast.com/hevo](https://www.dataengineeringpodcast.com/hevodata) and sign up for a free 14-day trial that also comes with 24×7 support.
- Select Star:  So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. From analyzing your metadata, query logs, and dashboard activities, Select Star will automatically document your datasets. For every table in Select Star, you can find out where the data originated from, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth in data is built in minutes, even across thousands of datasets. Try it out for free at [dataengineeringpodcast.com/selectstar](https://www.dataengineeringpodcast.com/selectstar) If you’re a data engineering podcast subscriber, we’ll double the length of your free trial and send you a swag package when you continue on a paid plan.