Former Twitter data platform builder now at Ginkgo Bioworks delves into big data, AI in biology, lab automation, protein engineering, and the evolving tech industry landscape. Topics: early data engineering, transition to biotech, LLMs for proteins, differences in software development between consumer tech and biotech, and the significance of model explainability in biology.
Optimizing DNA sequences for enhanced molecule production in biotech.
Transitioning from tech to biotech for impactful work with robots and DNA manipulation.
Importance of data versioning and lineage in integrating data engineering with biotech.
Deep dives
Challenges in Biotechnology Research
Optimizing biological processes through DNA manipulation using robots presents complex challenges akin to optimizing code without having the full understanding of its functioning. This process involves changing DNA sequences to enhance the production of desired molecules by single-cell organisms, necessitating iterative physical experimentation to monitor the outcomes.
Evolution from Tech to Biotech
Transitioning from a tech career to biotech, driven by the desire for more impactful and tangible work, involves exploring innovative technologies like robots manipulating organism DNA. This shift offers a futuristic vision of utilizing machine learning and robotics to enhance biological processes, such as creating new chemicals through microbial factories.
Data Engineering in a Biotech Context
The intersection of data engineering with biotech emphasizes the importance of data versioning and lineage to track the evolution of datasets and ensure accurate model training and optimization. This integration highlights the critical role of systematic thinking and understanding of large-scale data management in developing advanced AI systems within the biotech industry.
Laboratory Automation and Robotics for High-Throughput Experiments
Running multiple concurrent experiments using a 96-well plate system for efficiency is common practice, with robotics employed for liquid handling due to the high volume and complexity of experiments. By automating tasks, such as experiment setup, scientists can focus on optimizing workflows and data analysis for robust experiment outcomes. Coordination among multiple robotic systems and workload management are crucial for efficient experiment execution, while integrating scientific instruments and analytical tools facilitate result interpretation and decision-making.
Challenges in Bioengineering Iterations and Talent Attraction in Biotech
In bioengineering, the slow iteration cycles, stringent validation requirements, and costly mistakes necessitate meticulous planning and coordination to streamline experimentation workflows. The differences in approach between biotech and consumer tech industries highlight the need for precise design, rigorous testing, and close collaboration with domain experts in biotech software development. Attracting top talent to biotech companies poses challenges due to the unique demands of the field, where software engineers play a critical role in driving innovation and problem-solving in the complex biological landscape.
From building a data platform and Parquet at Twitter to using AI to make biology easier to engineer at Ginkgo Bioworks, Dmitriy joins the show to chat about the early days of big data, the conversation that made him jump into SynBio, LLMs for proteins and more.
Segments: (00:03:18) Data engineering roots (00:05:40) Early influences at Lawrence Berkeley Lab (00:09:46) Value of a "gentleman's education in computer science" (00:14:34) The end of junior software engineers (00:20:10) Deciding to go back to school (00:21:36) Early experiments with distributed systems (00:23:33) The early days of big data (00:29:16) "The thing we used to call big data is now ai" (00:31:02) The maturation of data engineering (00:35:05) From consumer tech to biotech (00:37:42) "The 21st century is the century of biology" (00:40:54) The science of lab automation (00:47:22) Software development in biotech vs. consumer tech (00:50:34) Swes make more $$ than scientists? (00:54:27) Llms for language is boring. Llms for proteins? that's cool (01:02:52) Protein engineering 101 (01:06:01) Model explainability in biology
Stay in touch: - Make Ronak’s day by signing up for our newsletter to get our favorites parts of the convo straight to your inbox every week :D https://softwaremisadventures.com/