Eric Ma, a leader in the research team at Moderna Therapeutics, discusses the tools and techniques used for drug discovery, the importance of machine learning and Bayesian inference, and the cultural questions surrounding hiring and management in research data science in biotech. They also explore the tech stack used in their work, the skills and hiring considerations in biotech, the importance of data testing and standardizing Excel spreadsheets, and the current state and challenges of Bayesian inference.
The importance of employing tools and techniques like MINA, machine learning, deep learning, Bayesian inference, and open-source software like Python in the biotech industry for drug discovery.
The challenges and rewards of transitioning from an individual contributor to a team lead in the biotech research industry, emphasizing the need for coaching and guiding teammates to produce high-quality work.
The essential role of machine learning in molecule design, optimizing sequence-to-function relationships, and improving the efficiency of finding high-performing enzymes.
Deep dives
Research data science in biotech: Tools and techniques for drug discovery
In this podcast episode, Eric Ma discusses research data science in the biotech industry. He highlights the importance of employing tools and techniques such as MINA, machine learning, deep learning, Bayesian inference, and open-source software like Python. The focus is on solving problems related to drug discovery, including target identification, molecule discovery, and vaccine design. Eric emphasizes the need for models to be differentiable, allowing for joint optimization and efficient experimentation. He also discusses the transition from being an individual contributor to a team lead, emphasizing the importance of coaching and guiding teammates to produce high-quality work.
Operationalizing data science projects using Python and Docker
As part of the research data science workflow, the team at Moderna Therapeutics uses Python, specifically PyTorch, extensively for differentiable models and generating protein sequences. They also utilize Typer to develop command line interface (CLI) tools which are then operationalized using Docker and Conda. This allows for deployment on internal infrastructure and the ability to scale up compute tasks seamlessly. The team also uses VS Code for collaborative coding and developing a standardized tech stack has helped ensure consistency and efficiency across projects.
Navigating the transition from individual contributor to team lead
Eric Ma discusses the challenges and rewards of transitioning from an individual contributor to a team lead in the biotech research industry. As a team lead, he emphasizes the need to coach and guide teammates, ensuring high-quality code and modeling decisions. Balancing technical excellence with the practicalities of project goals and team dynamics is essential. Eric highlights the importance of fostering independent growth and empowering teammates to take ownership of their work. The transition allows for collective achievements and the ability to tackle a wider range of projects collectively.
The Value of Domain Expertise in Research Data Science
In the field of research data science, it is essential to have domain expertise, especially in biochemistry and advanced analytical biochemistry methods. Working on biological problems requires a deep understanding of experimental techniques and an ability to integrate diverse sources of data. Scientists making decisions about which molecules to work with or which antibody sequences to make need to be well-versed in the relevant methodologies. Finding individuals with both data science and domain expertise skills can be challenging but crucial for success in research data science.
The Role of Machine Learning in Molecule Design
Machine learning plays a vital role in molecule design, especially for optimizing sequence-to-function relationships. Using machine learning models to guide the design of molecule libraries can significantly improve the efficiency of finding high-performing enzymes. In one study, using machine learning-guided strategies reduced the number of experiments needed by 10 times while increasing the chance of finding high-performing enzymes. However, the cost trade-off must be considered since ordering variants predicted by machine learning can be expensive. Differentiable end-to-end models that optimize sequences based on surrogate models' outputs are being developed, aiming to automate and improve the molecule design process further.
Hugo speaks with Eric Ma about Research Data Science in Biotech. Eric leads the Research team in the Data Science and Artificial Intelligence group at Moderna Therapeutics. Prior to that, he was part of a special ops data science team at the Novartis Institutes for Biomedical Research's Informatics department.
In this episode, Hugo and Eric talk about
What tools and techniques they use for drug discovery (such as mRNA vaccines and medicines);
The importance of machine learning, deep learning, and Bayesian inference;
How to think more generally about such high-dimensional, multi-objective optimization problems;
The importance of open-source software and Python;
Institutional and cultural questions, including hiring and the trade-offs between being an individual contributor and a manager;
How they’re approaching accelerating discovery science to the speed of thought using computation, data science, statistics, and ML.