How he built the best Covid forecasting model, lessons learned and how to improve model performance with Youyang Gu - The Data Scientist Show#032
Mar 31, 2022
auto_awesome
Youyang Gu, creator of covid19-projections.com, shares how he built an accurate Covid forecasting model using the SCIR model. He discusses the challenges faced when working with Covid data, the process of adjusting and tweaking the model, and the inclusion of additional features like school reopenings. Gu's model gained attention, leading to its inclusion on the CDC's website. He explores the effectiveness of crowdsourcing and the importance of diverse sources in forecasting. Gu is currently working on a project analyzing Covid mortality and inequalities within and between countries.
Youyang Gu created a COVID-19 forecasting model that outperformed medical experts.
Clear explanations and understanding of a model's limitations are important in modeling.
Relying on deaths as a data source for COVID projections proved more consistent and reliable.
Being open to different ideas and perspectives is crucial for growth and avoiding incorrect assumptions.
Deep dives
Career Journey and Model Development
The podcast episode features an interview with the creator of a COVID-19 forecasting model that outperformed many medical experts. The guest shares insights on his career journey, starting from studying computer science to his masters in machine learning. He discusses his first foray into data science with a thesis on food adulteration detection using neural networks. The guest then delves into how he transitioned from finance to building a COVID projection model during his downtime in the early days of the pandemic. He explains the challenges he faced due to the limited and inconsistent data available at that time.
Criticism and Validation of the Model
The guest discusses the doubts and feedback he received on his model and the responsibility that came with the attention it garnered. He talks about receiving criticism and engaging in conversations with skeptics and experts on Twitter, discussing the limitations and assumptions of his model. He also reflects on the impact of having his model included on the CDC's website and the increased credibility it brought. The guest emphasizes the importance of providing clear explanations of a model's capabilities and limitations, as well as the need for a broader understanding of modeling and the uncertainties associated with it.
Handling Data Challenges and Model Adjustments
The podcast explores the data challenges faced in the early stages of the pandemic, including the limited and unreliable data available. The guest explains his decision to rely primarily on deaths as a data source for making projections, as it was more consistent and reliable than cases or testing data. He also discusses adjustments made to his model based on feedback and real-world data. The guest emphasizes the importance of continuously refining and improving models while remaining cautious of overfitting and acknowledging uncertainties.
Lessons Learned and Realistic Approach
The guest reflects on his journey and the lessons learned throughout the pandemic. He emphasizes the importance of being humble and open to feedback, learning from mistakes, and continuously improving models. He also discusses the challenges of communicating complex modeling concepts and the need to manage expectations. The guest stresses the role of responsible modeling, being transparent about assumptions, and understanding the limitations inherent in any predictive model.
Importance of Looking Forward
One of the main insights from the podcast episode is the emphasis on looking forward and learning from past experiences. The speaker highlights the importance of focusing on what can be done next and the potential impact it can have on influencing others. This mindset allows for growth and innovation, as dwelling on past mistakes or missed opportunities is deemed unproductive.
The Advantage of Lack of Experience
In the podcast, the speaker discusses the advantage of having a lack of experience, particularly in the field of healthcare. By approaching COVID-19 with a blank slate and relying on data, biases and assumptions can be eliminated. This fresh perspective allows for a more objective analysis and a different approach to understanding the virus and its impact.
Balancing Perspectives and Challenges of Group Think
The podcast explores the importance of being open to different ideas and perspectives. The speaker emphasizes the danger of being confined to a bubble of like-minded individuals, as it can hinder growth and lead to incorrect assumptions. The speaker encourages reading from various sources, even if the opinions differ, to gain a more comprehensive understanding. Additionally, the challenges of group think are discussed, highlighting the need for a balance between expert opinions and diversifying perspectives through crowd sourcing.
Youyang Gu is the creator of http://covid19-projections.com. In 2020, while most Covid prediction model failed, without any experience in medicine he created a forecasting model that outperforms almost all medical experts. Yann LeCun, Facebook's chief AI scientist and professor stated that Gu's model "is the most accurate to predict deaths from COVID-19", surpassing the accuracy of the well-funded Institute for Health Metrics and Evaluation COVID model. It was cited by the Centers for Disease Control (CDC) in its estimates for U.S. recovery.
Currently, he is a member of the Technical Advisory Group at the World Health Organization. Working on laying the groundwork for a comprehensive, global study to document and analyze differences in levels of mortality attributable to COVID-19 between and within countries.
Today we talked about how he built the model, lessons he learned, his advice for data scientists and what his working on today. If you like the show subscribe to the channel and give us a 5-star review. Subscribe to Daliana's newsletter on www.dalianaliu.com/ for more on data science.