ML4Sci

Charles Yang
undefined
Sep 10, 2025 • 0sec

Inside Argonne's Aurora Supercomputer with Robert Underwood

IntroductionIn this episode, I sit down with Robert Underwood, a staff scientist at Argonne National Laboratory. We dive into Argonne’s mission as an open science lab, the power of its new exascale supercomputer Aurora, and how these resources are being harnessed to drive the future of AI for science. We discuss the AuroraGPT project, which aims to adapt AI models for scientific data, as well as the challenges of handling massive scientific datasets generated by facilities like the Advanced Photon Source.We also talk about how Argonne is collaborating with initiatives like the Trillion Parameter Consortium to push the boundaries of AI at scale, while staying focused on scientific workflows and reproducibility. Here are three takeaways from our conversation:Aurora: Public Compute at ScaleAt the heart of Argonne National Lab’s leadership computing facility is Aurora, a public exascale supercomputer purpose-built for large-scale scientific computation. Unlike commercial cloud GPU clusters, Aurora is optimized for running massive, coordinated jobs across tens of thousands of nodes—something essential for many forms of modern science, from fluid dynamics to materials modeling. As Robert Underwood explains, Aurora also supports mixed-precision compute, allowing researchers to exploit AI workloads as well. The lab’s role as an open science facility means this capability is available to academic and public researchers, not just private industry, and reflects a broader vision of compute as national infrastructure.AuroraGPT: AI-for-Science Models AuroraGPT is Argonne’s initiative to adapt foundation models for scientific domains—ranging from high-dimensional physics simulations to sparse bioinformatics graphs. Rather than build one giant model, the team is developing a family of models tailored to specific scientific questions and modalities. Robert notes that this effort is constrained not by compute—Argonne has secured DOE-scale allocations on Aurora—but by personnel, with only a ~30 person team. Argonne is also one of the key backers of the Trillion Parameter Consortium, composed of other scientific and industry leaders, working on building a trillion parameter AI model for science.Managing the Scientific Data DelugeUnlike commercial LLMs that scrape a finitely sized internet and now rely on synthetic data, science faces the opposite challenge: an overwhelming flood of data generated by experimental infrastructure like the Advanced Photon Source. Each beamline can generate up to a terabyte per second. To handle this, Argonne is pioneering a hybrid edge-HPC architecture—compressing real-time generated scientific data using GPUs and FPGAs at the beamline before routing it to supercomputers like Polaris for further analysis. This vision of autonomous experimentation—AI models directly interfacing with scientific instruments—marks the future of how we’ll do science at scale.TranscriptCharles YangOkay, awesome. Today I have the pleasure of having Robert Underwood join us. Robert is a staff scientist at Argonne National Lab. Robert, thanks for coming on.Robert UnderwoodYeah, thank you for having me.Charles YangGreat. So maybe first it'd be helpful if you could give our listeners a sense of Argonne National Labs mission and history and focus. Not everyone might be familiar with kind of what the national labs and Argonne in particular does.Robert UnderwoodSure. The national lab system traces its origins to the Manhattan Project—so these labs go all the way back to the development of the atomic bomb. Argonne was one of the original labs, alongside Los Alamos. But since then, the labs have evolved significantly. Their mission today is much broader: developing energy and science capabilities for the benefit of the nation.That includes a wide range of research domains, and increasingly, that means working on AI for science.Argonne is what’s called an open science lab, meaning our facilities are available to external researchers. We're one of the three major computing labs in the DOE ecosystem, alongside Oak Ridge and NERSC. These three house some of the world’s most powerful computing infrastructure.At Argonne, that’s the Argonne Leadership Computing Facility, or ALCF. The crown jewel there is Aurora, one of the world’s largest open science supercomputers. It delivers over one exaflop of double-precision floating point performance—which is a staggering amount of computational power.But what’s especially interesting about Aurora is its flexibility: it can also compute in lower precision formats, which makes it uniquely valuable for machine learning and AI workloads. That’s one of the key areas we’re exploring—and something we’ll talk more about today.Charles YangYou mentioned that Argonne is an open science lab. That’s in contrast, of course, to the weapons labs under DOE that aren’t quite as open, shall we say.You also brought up Aurora, which I believe came online just a few months ago and recently ranked number one in the world on the Top500 high-performance computing benchmark [Postscript: Aurora is now #3 on Top500, behind Frontier at Oak Ridge National Lab and El Capitan at Lawrence Livermore National Lab]. Could you walk us through how a system like Aurora compares to what we’re seeing in the commercial space—particularly the new cloud GPU clusters companies are building, like the ones from CoreWeave or Lambda? Those also have a lot of compute. So what’s the real difference?Robert UnderwoodYeah, great question. There are a few important differences.First, national lab systems like Aurora tend to emphasize specialized hardware characteristics. We typically use high-performance interconnects—that’s becoming more common in commercial AI supercomputers, but it’s still a differentiator. We also rely heavily on parallel and distributed file systems, which offer different consistency models than what you’ll find in commercial cloud environments.Another major distinction is job structure and scale. We design our systems to run single, extremely large jobs—things that may need the entire machine to run. That’s a core part of the mission of the Argonne Leadership Computing Facility: to enable one-of-a-kind science that simply isn’t possible on other infrastructure.With Aurora, for example, we’re talking about jobs that span 10,000+ nodes, with something like 60,000 GPUs all working in tandem on a single simulation or model. You just don’t get access to that kind of coordinated compute outside the lab environment. For many scientific applications, it’s the only viable way to run these workloads.Charles YangThat makes sense. I do want to dive into the details of the AI-for-science work you’re doing, but maybe one more tangent on compute architecture.You mentioned that Aurora—and high-performance computing (HPC) more broadly—tends to focus on high-precision float types, like double precision. But many modern AI workloads are now shifting toward lower or mixed precision to scale better.Do you see a divergence emerging between the needs of traditional HPC and the requirements of AI workloads? It feels like there are two increasingly distinct paradigms for large-scale compute, and I wonder whether the labs will need to start rethinking how they architect future systems to support both.Robert UnderwoodI mean, from my perspective, what I see is that industry is actually getting closer to us. It’s not so much that industry cares about double precision, but if you look at the other exascale machines in the United States, they also use GPUs where, if you want to get the maximum possible computational performance out of the machine, you have to use these lower-precision floating point units. This would be like Tensor Cores on NVIDIA hardware.But AMD and Intel each have their own equivalent—something like BFLOAT16-style computational capacity. And if you want to fully leverage that, you need to use lower-precision formats.So while we talk about Aurora as being an exaflop machine, I think if you use the 16-bit precision, if I’m not mistaken, it gets close to 12 exaflops of performance. So if you’re really looking to take advantage of the peak power of the machine, you’re going to be using these low-precision representations.Charles YangRight. Well, so maybe let's talk about the project that you guys announced over a year ago now called AuroraGPT. What is it?Robert UnderwoodSo AuroraGPT is Argonne's effort to prepare for a future where AI and science are much more heavily integrated. One way we think about that is by asking: what does it mean to leverage data that’s unique and specific to scientific applications and workflows in the context of AI?That data often looks very different from what you typically see in most industrial use cases. For example, we might need to represent higher-dimensional data—like 5D or 6D tensors—for certain kinds of physics problems. We might need to handle very large graph data, or work with sparse and unsparsed grids or meshes, which are often used in things like finite element codes.So there are many ways in which the labs have unique data structures that are extremely valuable for solving specific scientific problems, but which haven’t really been explored by most major AI players in the industry. That’s where we see a niche: how do we adapt AI models and tooling to scientific workflows and applications?Charles Yang:Yeah, and I think that data modality point is really important. The kinds of examples you’re describing—those aren’t things ChatGPT is going to be able to help with. Or maybe it could, but the dimensionality just isn’t the right shape for that kind of model.So is AuroraGPT a single big model that you’re training? You mentioned a bunch of different modalities and different kinds of scientific applications. What does the progression look like so far?Robert UnderwoodSo what we're really looking at is kind of a series of models, each aimed at answering one or more different kinds of scientific questions. For example, we might want to understand whether a model trained on substantially more biological sciences information is better at answering questions in that domain. That might be one of the questions we’re trying to answer.So we would train not only on papers and standard reference materials available in biology, but also look at adapting various other resources. For example, Argonne has something called the BVBRC, which I believe is the Bacterial and Viral Bioinformatics Resource Center, which is one of these major resources that we're using for trying to do these experiments around bio. The BVBRC is a multimodal database containing both tabular information and other forms of data. It includes descriptions of actual in-lab experiments—experiments that people have done using different materials and biological samples—as well as simulations involving similar or sometimes the same materials. So you can imagine this is a very rich dataset, and we’re trying to explore how we can take all of that and make it accessible to scientists working with AI.Charles Yang:That’s really interesting—especially this biological dataset you mentioned. How does that compare to something like Arc’s Evo model?The scale-pilled thesis, of course, is: if you throw enough tokens at it, the model can learn a lot of the underlying relationships. And on the biology side, a lot of that work focuses on tokenizing gene or DNA sequences. Is the dataset you’re describing different in that regard? And how does it stack up against what we’re seeing in industry?Robert UnderwoodYeah, so my perception not having looked into the details of the EVO model specifically is that Arc is doing some very interesting things kind of in the materials space. They've done some kind of techniques where they look at kind of equivariant neural networks to adapt kind of my understanding is like MD style simulations of the different materials and particles and then adapting those to models.I think that that gets you some of the way there, but my impression is that there are kind of richer forms of data that might be available from simulations that the labs maybe have access in greater quantities or greater varieties than what companies may have in this particular space.So one way in which we can have access to a large amount of data is we have a large facility also at Argonne called the Advanced Photon Source. This allows us to take imaging, essentially, of different materials and understand information as it's being the structure of the materials that we're imaging. So as we're studying the structure and better understanding these more fundamental properties of the materials and the biological samples that we're studying, that then goes back and informs the next set of experiments that one might run. So there's a deep interconnectedness to this that might extend beyond what you might be able to contain in, say, a single simulation about a single material, but finding these deeper relationships that might exist. Now, it's possible that you can get at that with, as you described it, this just scaling with additional tokens. But I think the better way to think about it is like, is it better to provide a more concise, richer data source or a larger, less rich data source? And I think that's kind of an open question that we'll see be answered over the next coming years.Charles YangAnd the good point about the APS, I mean, I think the UK announced a very similar project run using their hard light source in their national lab to generate a protein ligand data set. So certainly do want to talk about the role that scientific infrastructure plays in generating data for AI models. But before we leave the aura GPT, it sounds like you are developing a number of kind of foundation models that are specifically geared towards this kind of high fidelity experimental data that might not be easily tokenizable by the current class of industry models.What's the kind of state of the effort? I mean, like how many people are working on it? What kind of compute systems are you all using? What's the scale of the models you guys are working with?Robert UnderwoodSo at this time, I think we have like order of 30-ish people that work some percentage of their time on AuroraGPT. So if we compare that to industry efforts, they're going to have a lot more people because they have a much larger budget for these kinds of things. But the idea was that we want to kind of use the small amount of resources that we do have and leverage them for the biggest impact that we can.So in terms of of model sizes, we've looked at kind of 7 billion parameter models, and we're looking at like 70 billion parameter models kind of scaling up from there. These are kind of sizes that are useful in terms of being able to fit on existing info. So if you look at kind of across the space, you'll see a series of both 7 billion, 9 billion kind of parameters, just kind of this kind of smallish, but still useful size. And then you'll kind of see kind of the 70-ish billion parameter size that roughly corresponds to like a single DGX node worth of hardware. And those are kind of common sizes that you typically see. Argonne also is affiliated with something called the trillion parameter consortium. So we have aspirations to eventually go bigger, but we're kind of starting small and building and experimenting with these techniques at smaller scales to see where they can eventually go.So at this time, I think we have on the order of 30-ish people who spend some percentage of their time working on AuroraGPT. If you compare that to industry efforts, they’re going to have a lot more people, simply because they have much larger budgets for this kind of work. But the idea for us is to use the relatively small amount of resources we do have and try to leverage them for the biggest possible impact.In terms of model sizes, we’ve looked at 7 billion parameter models, and we’re starting to scale up to 70 billion parameter models. These sizes are useful because they can still fit on existing infrastructure. If you look across the space, you’ll see a bunch of models in the 7 to 9 billion parameter range—smallish, but still quite useful.And then there’s the 70-ish billion parameter size, which roughly corresponds to a single DGX node’s worth of hardware. That’s another common checkpoint you see across the industry.Argonne is also affiliated with something called the Trillion Parameter Consortium. So we do have aspirations to eventually go bigger. But for now, we’re starting small—building up our tooling and experimenting at these more tractable scales to see how far we can push the techniques.Charles YangYeah, I mean, have you all had any results come out of it that you can talk about now?Because my general concern is—it’s 2025, and industry models are getting larger and larger. But even now, they haven’t really proven much directly. Though, to be fair, groups like Future House in San Francisco are starting to productionize some of them in more domain-specific ways.When you talk about 7 billion and 70 billion parameter models—granted, these are very different kinds of architectures, especially when you're dealing with higher-fidelity data—but it still feels like the pacing and the level of resources going into testing this hypothesis you’re describing seems kind of disproportionate or maybe inadequate relative to the broader conversation.Do you all have a timeline for when you're expecting results? And what would you need to see to feel like the hypothesis is validated—or, on the flip side, to conclude it’s not the right path?Robert UnderwoodSo I would say that we're working very actively to have kind of the first set of results that we're ready to talk about publicly. We're not ready to do that at this stage. But what I can say is that this is a very large problem and I have confidence that it will not be solved by the time that we publish our results. So even outside of the context of like actual models being released, we are making efforts to make kind of methodological contributions and kind of other contributions around the evaluation of AI. So a good example of this would be the EAIRA paper. So the evaluation team here at Argonne has recently put out a paper called EAIRA, which looks at proposing a methodology for evaluating AI models in the context of science. So if you kind of look at that methodology, it proposes kind of two existing components of most methodological stacks, which are multiple choice questions. And that, but we look at them specialized at science for scientific purposes. And then we kind of look at where are there gaps between existing benchmarks that are used for evaluating these kinds of models? as well as kind of more specific. Then we kind of move into more like open generation style questions or free response style questions. But then the kind of the last two things that we look at that are kind of, I think, kind of different than what you see a lot of the evaluations doing right now is you see that there is a desire to look at what we call lab style experiments or kind of think of these like case studies or so these are like very long form experiments where we have domain experts working on a very hard cutting edge problem. We bring them in for multiple hours and we have them work with state of the art models from across the different vendors. And we ask the same similar sets of questions to each model and we evaluate where are there gaps. And while it's not the same thing as producing a model per se, it's a meaningful contribution in terms of describing where there are gaps in the methodology for evaluating these models. So another kind of thing that you kind of see, which is towards the fourth category proposed in the paper, which is this notion of what we call field style experiments. So these are large scale experiments. So you may have heard of something called the thousand scientists jam that was organized by the US Department of Energy and OpenAI and Anthropic earlier this year. So part of that effort is looking at how do we scale up this kind of idea of a lab style experiment, but to a larger community and kind of building automated tools and scalable evaluation methodologies to quickly assess across the corpus, maybe even the size of the entire DOE, what are invaluable problems that we want to solve. So while building a model is like a piece of our mission, it's not the only piece of our mission. Charles YangYeah. I certainly think, yeah, to the point about evaluation, mean, that's something that we've seen, like, you know, open as fund a lot of work around AI for math, where they are essentially trying to pull out like benchmarks that they can then use to benchmark their models against, right? And that takes a lot of work for mathematicians to kind of get involved in. It's certainly a form of labor at the very least. I mean, and I think to the broader point around AI for science models, you've seen, you know, metas come out the Evo and the OMOL models. Google Deepmind come out the Graphcast and their AI for weather forecasting models. So, certainly the thesis, I think, is supported by many that there are differentiated classes of models for scientific data specifically. But that message certainly gets lost a lot, I think, in the discourse of not all AI models are born the same or trained the same way.Robert UnderwoodYeah. Yeah, but the other thing is, if you look at a model like Olmo, example, Olmo is in many ways trying to solve a very different problem than what you might compare to with a llama3, right? Because one of the distinct purposes of the Olmo model specifically is that they want the entire process to be fully reproducible.And having kind of a fully reproducible model stack all the way down to the data is actually really important if you're wanting to meaningfully measure like what are the performance differences for, for example, injecting a bunch of biological sciences data. Cause you know exactly what was in the training set. And if you want to go back and audit, where did this weird generation come from? You, you, if you're doing this with a llama based model, you don't have a prayer. whereas if you have a model that you've trained from scratch, whether it be based off of or some other kind of data set where you have this full provenance. This is something that's really, really valuable in a scientific context, whereas in a business context, you may or you may not care about that full level of traceability.Charles YangRight. No, I think that's definitely another point as well about the differences between these kinds of models, how we bake these models. OK, last question on AuroraGPT, and then I do want to talk about the data generation side. What do you think is kind of like the primary limitation right now? I'm certainly very supportive of this whole effort. Podcast is really around focusing on AI for science, and I think having a public capacity to do that is obviously important. What are like kind of the primary limitations you think to scaling up the success of AuroraGPT and Argonne National Labs involvement in the AI for Science. It people? Is it compute? Is it something else?Robert UnderwoodMy impression is that compute is by far our biggest scaling, or not compute, personnel is our largest scaling constraint that we have right now. As I said, we're a very small team. And if you look at just like, for example, Meta, my impression is that there were order of a thousand people involved with the llama three paper. could have, you know, roughly miscounted there, but like they're at least an order of magnitude, if not two orders of magnitude larger than the effort that we have. So if we're wanting to kind of demonstrate comparable style outputs and comparable style efforts, we're going to need more people than we probably have now. Now, what's kind of...Charles YangBut I mean, are you all trying to compete on llama or are you all trying to compete on, I mean, I do want to distinguish like what is like the right benchmark of reference here, right? Because if you start saying we need a thousand people like meta to do a llama style thing, people are gonna ask why is Argonne building a llama style model.Robert UnderwoodI mean, I guess that's a fair assessment, but at the same time, like there are a lot of different science domains and very few, if any of them have robust treatments in science. So I don't know if a thousand people is necessarily the right number, but my point is if you, if you want to see larger and faster progress out of the labs on these kinds of efforts, we are going to need more people to do that kind of work.Charles YangYeah, definitely. Well, and it's interesting to you say personnel not compute, because I know for some other companies that that is like the primary limitation.Robert UnderwoodI mean, compute will eventually become a limitation, but I think, like, for example, we're able to secure an INCITE proposal. So INCITE is a program within the DOE to get large scale allocations of core hours on machines like Aurora. And like, while we are definitely making use of our INCITE allocation, I think if we had more people, we could make even more effective use of that allocation. So I think personnel right now is our biggest constraint.Charles YangYeah. OK. I mean, I think that's going to be helpful for many folks to hear. OK. Let's talk about, I mean, in the other hat that you wear at Argonne, you also do a lot of work on scientific data compression. Why does that matter? What's the kind of motivation there?Robert UnderwoodSo scientific data compression is really important for a lot of different domains. So if you look at these exascale applications, they can produce mind-numbing volumes of data. So if I'm not mistaken, the hack simulation, hack farpoint that was ran, I think it generated on the order of 2.3 petabytes of data. If you look at things like the APS upgrade at Argonne or LNLC's [Linac Coherent Light Source] upgrade at SLAC, these facilities are like on pace to produce order of a terabyte of data per beamline in some cases, which is just a mind numbing volume of data. And with that much data, you really have to have a careful and thoughtful approach as to what you're going to do with that data in the long run. So data compression is one of a variety of ways by which you can approach that problem.And what's interesting about data compression is that it allows you to retain the original dimension of the data, so the original sizing information of the data, and the original number of the data. So you're not reducing the featuredness or the richness of the data. You're not really producing the number of data. What you're reducing is the precision. And in many cases, for applications, it's better to lose precision on data, especially if you can control exactly how much precision you lost and where you lost it from. So for example, if you're looking at, for example, fine-grained features on a subsurface, you might have a large portion of the data that's relatively sparse, and you can kind of compress that very aggressively, because there's not a lot of scientific content in that sparser region. But where you have kind of a turbulent boundary condition that exists kind of between two points, maybe you need to have a more conservative style compression approach that gets used in that region where there's a lot more scientific content. kind of using compression, you can kind of address concerns both of data rate. So basically how fast are you producing data? So like these APS or LCLS style use cases, but you can also address use cases where you need to do large scale data archiving. So kind of deployed at scale, you can imagine data compression would allow you to dramatically reduce the needs for long-term storage of data, not because you're actually storing dramatically less data, but the footprint of that data on storage is dramatically smaller. So that's where compression kind of plays a role and can be very helpful.Charles YangYeah, I mean, I think that's an interesting contrast because, I mean, with a lot of these conventional industry models, they've kind of reached data limits now of the known set of tokens in the world and people are doing synthetic data and all these complicated things. But in the world of science, like we're actually drowning in data in some sense, right? Like there's too much data being generated by these massive particle accelerators and hard light sources that there's this whole field that is looking to understand how to grapple with all that data in a way that's like sort of more manageable.We talked with Sergei Kalinin who does autonomous microscopes and he talked about the massive amount of data being generated by each microscope nowadays at the leading frontier. Do you kind of see a heterogeneous or hybrid architecture in the future for scientific instruments where they are running, or maybe this is already the case, like massive data compression at this point where the data is being generated at the facility.Robert UnderwoodSo just to kind of define terms really quickly. So what I'm hearing from you is that you're saying that you're going to have kind of some edge facility where you're producing data at a very large rate. And then maybe you have a set of edge devices that are going to kind of process or accept that data, transfer it over a network, and then maybe you have some large computing resource where you're going to then kind of do further processing on it after potentially you've either restored it from an archive or after you've transmitted across a wide area network. So if that's what you're describing, Like we already do those kinds of techniques. Yeah.Charles YangThat's what I'm assuming. Yeah. So it's already happening. Yeah. Do you want to talk a little bit more about like maybe for the advanced photon source at Argonne, which is one of the brightest light sources, I think, at least in the country, what does that kind of look like in terms of data flows and where the data is being processed?Robert UnderwoodYeah, so in the case of the APS, so you can have large different experiments that are conducted at one of many different beam lines that exist on the APS. think there's an order of 80 different beam lines. And the way that you can think about this is each beam line specializes in a particular class of experiments. So you might have some experiments that are performing something called tomography. So this is, as I understand, MRI-style images where you're looking at trying to understand the structure of a material. You may have other ones that are looking at trying to understand more like subatomic style interactions that exist on different materials. And for each of these, you'll use either different wavelengths of the X-rays, or you'll use different intensities of the X-rays to kind of study or different ways of capturing the X-rays as they're going off of the sample. So maybe in some cases, you're shooting directly through the sample and you're studying direct. In some cases, you're looking at backscattering.So you have different kinds of ways of conducting these experiments. And these can each produce very large volumes of data. So in the case of small angle scattering, like wagon angle scattering experiments, you could potentially generate up to a terabyte of data every second. So in that case, the team that I'm working with as part of another project called Illumine is looking at how can we design specialized compression techniques that will allow us to kind of take the data that comes directly off of these detectors and make it small enough that we can get it from the detectors to intermediate storage where we can then potentially recall it on either a locally available HPC resource or a further away HPC resource. In the case of APS, we frequently will use the ALCF resources. It's actually kind of interesting. at the ALCF on machines like Polaris, which is one of our other large computing resources that we have, there's actually a special queue that's called the demand queue. And what the demand queue is for this particular block of racks, if a job comes in from the advanced photon source that needs to be processed in near time, we can actually kind of prioritize the jobs that are coming off of the APS for that set of racks that are dedicated towards this demand queue. And then other jobs can kind of run at a lower priority in the background when there's not an APS job just to keep machine busy. So having this ability to kind preempt the computation on the ALCF resource when you have kind of an urgent need for compute is kind of an interesting and exciting way that you can kind of combine these large scale facilities together.Charles YangSo, I mean, that's an interesting, like, sort of, I guess, partnership between the fact that you have this leadership compute facility at Argonne and, like, of the, including one of the world's largest supercomputers and one of the world's largest beam lines or hard light sources that's also sending data back and forth. Do they process any data locally at the APS or is it always sent to Polaris?Robert UnderwoodSo this is actually a good question. So depending on what the task is, they will perform certain operations at the edge. So for example, if you're looking at, example, this is a technique that's used at LCLS, not at the APS. But if you're doing a technique called serial 50 second crystallography, you might, for example, conduct kind of initial peak detection at the edge. So basically, there are certain regions of this data that contain particular, I'll call them bright spots. I think the scientific term for them is brag spots. And these brag spots are kind of bright, scientifically significant pieces of the overall image that you have. So the reason why you might look for these on the edge is that you can perform this technique called non-hit rejection. So if you have a frame that you take a picture of, and in or across this frame or across this detector, you don't see that there are any peaks on this particular frame. You can actually discard that frame entirely, which reduces the amount of data that you have to transfer. in some ways, it's kind of an adaptive sampling technique, kind of based off of how much information is in the frame. And then the second thing you frequently will do is you will then apply compression. So you're doing the compression near the beam line, and then you're going to do your analysis further away.So you're trying to leverage like what techniques do I need to have nearest to the beam line where I can handle them both, where I can perform them both in terms of their complexity at a very high rate. But also where can I leverage the fact that I haven't had to transfer that data yet in order to make the most effective use of it.Charles YangAnd so for beamlines that are each generating up to one terabyte a second and doing this kind of compression, mean, does this mean each beamline has a CPU server that's dedicated to servicing the data and compression needs? And roughly, what scale are we talking about here?Robert UnderwoodSo if you look at different beam lines have different needs. So for example, not all beam lines necessarily generate a terabyte a second. But the ones that do, at least at Argonne, they actually have essentially Polaris nodes that are deployed at the edge. They're like the very similar kind of hardware. They may have slight differences in terms of the network interface, for example. But otherwise, they look very, very similar to the kinds of resources we already use on the supercomputer. So there's just fewer of them.So if you want to kind of look at kind of a more forward-looking example of this, you might look at, example, LCLS is doing at Slack. In their case, they're actually looking at kind of taking data directly off of FPGAs, field programmable gate arrays, and then communicating that directly to a GPU where they do some preliminary processing. And then after that, they then use a technique where they send that data directly from the GPU across the network interface to either long-term storage or for further analysis. So you can potentially get these really integrated beamline designs where you're very carefully understanding each of the stages of the pipeline, you're kind of deploying them as a collective whole.Charles YangAwesome. And so these are GPU-based workloads.Robert UnderwoodYes, so because of the data rates that these systems have, you're very frequently going to be moving towards GPUs for many of the different frameworks that you have.Charles YangThat's awesome. So we're basically running a, I guess this is the dichotomy of both scientific simulations, video games, and AI training are all kind of similar style workloads. Okay, so we're basically doing like GPU, like pre-processing of the flood of scientific data being generated at each of these beam lines.Robert UnderwoodYeah, GPU plus FPGA. So some of the tasks are even actually being done on FPGAs because, for example, if you're doing this non-hit rejection, that might be something that you really want to do it in hardware, given the throughput constraints that are involved with that particular part of the process. So if you can build it sometimes, FPGAs also play a very important role in these kinds of real-time scenarios.Charles YangAnd roughly what kind of data compression rate are we talking about? Is it like on there of like compressing 5 % or is it more like a 10-fold compression?Robert UnderwoodSo the goal set out for us by the different beam lines is usually to achieve at least a 10x in compression. In some cases, it's a 20x in compression relative to the raw data stream. And in many cases, we have been able, and if you are interested, can point you towards some papers where we've done this exact kind of technique on a variety of different beam lines and approaches.Charles YangYeah, I mean, that's awesome. And I think really, again, striking the difference in how these fields think about it where the AI world is turning to synthetic data in the scientific world is compressing up to 20x of its data because it can't deal with the amount that's being generated at these facilities. What do you think is kind of a promising area? mean, so, you know, Argonne has all these large pieces of scientific infrastructure like beam lines that are generating vast amounts of data. What do you see as kind of a particular fields or applications that you're excited about in where AI could potentially play a role. mean, this is sort of going back to the Aurora GPT conversation. We talked earlier about UK OpenBind competition as one example of what the role of the scientific infrastructure can play. Have you come across any others that you think would be exciting or perhaps not enough folks know about?Robert UnderwoodSo what I would say is we're as part of our AuroraGPT, we're actually actively working with both beam lines at the event on source, proton source, in addition to kind of the biological data group that we have here at Argonne. so I think that definitely as you kind of progress and it's kind of the development of science, you'll see increasingly that we're going to be linking AI systems up to things like self-driving labs, where we're going to utilize either robotics to conduct experiments and then collect the results from those experiments and then interpret them using AI. Or maybe we use AI to guide where is the next most promising experiment to perform. So there's a lot of opportunities here as we interface automatable infrastructure with AI systems.Charles YangAwesome. And certainly self-driving lab is something we've talked a lot about on this podcast as well. So great to hear. mean, that's quite the vision of both generating mass amounts of data at these beam lines, running large scale AI models on the leadership class facilities that you have at the supercomputers, and then using the self-driving lab infrastructure at Argonne to kind of then iterate from there and generate more data. And the beam lines have some degree of automation themselves as well,Robert UnderwoodYeah, and that's actually something that we're trying to improve with projects like Illumine. So if you look at what the project has, there's roughly three thrusts. And two of those thrusts, broadly speaking, have to deal with kind of what's sometimes referred to as integrated research infrastructure. So how do we more actively communicate kind of and make decisions in an automated fashion, both at like the kind of each frame detection scale. So that's like one really tight real time kind of set of constraints.But also kind of broader optimization style constraints that might happen at the order of several seconds or several minutes. So you have kind of different reinforcement loops that are happening at these different timescales, making different kinds of decisions about how the experiment potentially will progress. So I think it's a very exciting area. It's a project I'm excited to be a part of.Charles YangAwesome. I can't think of a better way to end than that. Robert, thanks for the time.Robert UnderwoodYeah, appreciate it.
undefined
Jul 15, 2025 • 0sec

Professor Ken Ono on Working with AI in Mathematics

In this fascinating discussion, Ken Ono, a Mathematics Professor at the University of Virginia, dives into the synergy between AI and mathematics. He highlights how collaboration has transformed math research, breaking the walls of solitary study. Ono explains the current limitations of AI in creativity, yet praises its role in assisting mathematicians. The conversation also covers the Spirit of Ramanujan project and the necessity of balancing AI with human insight, paving the way for future innovations in the field!
undefined
Jun 24, 2025 • 0sec

Professor Keith Brown on Automating Materials Discovery

Introduction In this episode, I sit down with Keith Brown, associate professor of engineering at Boston University and principal investigator at KABlab, to discuss how his lab builds and operates self-driving experimental platforms, particularly using 3D printing to explore mechanical and material properties. He explains the use of Bayesian optimization in high-throughput campaigns involving tens of thousands of experiments, shares lessons from developing open-source hardware for mechanical testing and electrochemistry, and reflects on how graduate students’ roles evolve in automated research settings. We also discuss model selection for small versus large data regimes, modular infrastructure, the limitations of centralized automation, and how out-of-distribution reasoning still sets human scientists apart. Here are three takeaways from our conversation: 1. Decentralized self-driving labs will drive the next wave of innovationCentralized mega-labs are not necessarily the future; most progress has come from small, distributed groups.Researchers innovate faster when experiments are hands-on and local.Infrastructure can be shared without consolidating everything in one place.2. 3D printing is emerging as a core engine of materials discovery3D printers enable rapid, programmable variation in structure and composition.Their voxel-level control makes them ideal for combinatorial screening.Shared platforms allow reproducible studies across labs and scales.3. Human scientists remain essential to shaping long-term experimental campaignsHumans guide the experiment design, tuning, and interpretation.Roles shift from operator to systems-level thinker and optimizer.The most successful campaigns treat self-driving labs as a collaborator, not a black box.Transcript Charles:Keith, thanks for joining.Keith Brown:Yeah, yeah. Thanks very much, Charles, for having me.Could you describe how you set up your self-driving lab to optimize 3D printed structures for mechanical energy absorption? (1:00)Charles:I do want to talk about self-driving labs broadly, but maybe first we can acquaint listeners with your work. I thought we could start with two or three of the papers you’ve done around self-driving labs for optimizing 3D-printed structures for mechanical energy absorption. Can you walk us through the setup for those papers and tell us a little bit about that work?Keith Brown:Absolutely. When I started my independent career at Boston University, I got very interested in how we design mechanical structures — things like crumple zones in cars, padding in helmets. We still have to do lots of experiments to figure out how different structures and materials perform in those situations. That’s tedious, time-consuming, and wasteful.So we worked on developing a self-driving lab to study the extreme mechanics of polymer structures. It combines several 3D printers (initially, we had five) that print structures automatically. They’re retrieved, photographed, weighed, and then tested in an Instron compression machine, which compresses them until they’re flat while measuring the force required to do so.This lets us learn the mechanics of each structure and use that information to design the next one. It’s a closed-loop system that prints and tests new structures, aiming to find ones that absorb a lot of energy. The goal is to create crumple zones and similar systems that are lighter but just as effective.We’ve been doing this since about 2018. At this point, we’ve run about 30,000 experiments with the system. Over the years, we’ve worked on different facets. Most recently, we’ve been developing helmet pads for the military to improve blunt impact resistance. For example, if a soldier falls out of a vehicle and hits their head.We’ve been able to design structures that perform very well in that capacity. We’ve also achieved world-record performance in energy absorption efficiency, meaning absorbing the most energy possible per unit volume. I’m happy to dive deeper into any of these aspects, but we’ve basically had a robot running continuously since 2018.What were the challenges of integrating the robotic and printer systems together? (4:10)Charles:When I saw that paper, it seemed like it had one of the largest experimental campaigns for a self-driving lab that I’ve seen, especially with five different 3D printers and 30,000 structures. I'm curious: in the photo of the setup, you see five printers around a UR robot arm with a UTM next to it. What were the challenges of integrating those systems? How did you get the robot arm to pull the sample from the printer into the UTM, know when it was done, and so on?Aldair E. Gongora et al., A Bayesian experimental autonomous researcher for mechanical design. Sci. Adv. 6, eaaz1708 (2020). DOI:10.1126/sciadv.aaz1708Keith Brown:Yeah, great question. Each system has its own software. The universal testing machine has very specific, closed software. Each 3D printer has its own software. We also had to control the machines, choose experiments, use machine vision, track weight and scale, and more. Altogether, there were around 12 distinct pieces of software that needed to operate together.Each link in the chain required a lot of thought and effort from the team. One story that illustrates this well involves the Instron system. When we bought it, we were told it had an API for making software calls, but we couldn’t get that to work. Instead, we used a serial port on the side of the instrument that we could send a high voltage signal to in order to start it.So instead of full software control, we simplified the interaction. We told it to start and then waited for it to signal that it was done. That worked reliably. A big part of building the automation system was choosing where to simplify. Did we need full control, or just enough to get the job done?Ultimately, we had three different computers running different parts of the workflow, listening for signals, and sharing data. That doesn’t include the cloud services we used, like a shared computing cluster here at BU and even Google Drive in some cases. The students who built this system had to become skilled in everything from old-school serial ports to modern machine learning.How much of the project’s development time was spent just getting all the devices to communicate? Which part of the system was hardest to integrate? (6:20)Charles:How much time would you estimate was spent just getting the different systems to talk to each other? Out of the whole project, how much of it was integration overhead?Keith Brown:I can give a pretty firm answer. We went from everything in boxes to a functioning system in about three months. The team was made up of one doctoral student, one master's student, and two undergrads. That initial system was a bit brittle — if something broke or dropped, we didn’t have great recovery protocols — but the core integration was there.We’ve made a lot of improvements since then, but the foundation was built in those three months.Charles:Which part of the system was hardest to integrate? I’d guess the UR [Universal Robots] robot is fairly mature and designed to work with different systems. Was it the 3D printers? The Instron? How would you rank their maturity?Keith Brown:The UR system is very sophisticated. We’re impressed daily by how well it works. There are different levels of control you can use. Some people use a Robot Operating System (ROS) and similar frameworks to micromanage every movement. But we realized we only needed four or five specific actions. So we programmed a series of waypoints and let it run through those.Since we controlled where the printer put each part on the bed, we could script the robot’s movements very precisely. That’s still how we use it today. We have more scripts now and more complex logic, but the core idea is the same. It also has force feedback to avoid blindly executing commands, which helps with robustness.The printers are also highly automatable. That was one of the big reasons we chose this project. If you compare it to doing a similar experiment in chemistry, you run into issues with reagents sitting out, temperature control, and timing. But with a 3D printer, you can create a very complicated structure and test it almost automatically.That said, there are still challenges. One big one for fused deposition modeling is removing prints from the bed. Sending a job to the printer is easy, but getting the part off the bed often requires a human with a scraper.We tackled that by focusing first on architectures we knew we could remove easily, like cylindrical structures that the robot could lift with a prong. Later, we developed strategies for peeling parts off more gently. These are the kinds of things you don’t think about when you’re just printing for fun, but become very real problems when you're automating.How did you calibrate five different 3D printers to ensure reproducibility across the self-driving lab? (10:30)Charles:And on the question of printers, you had five. One broader concern with self-driving labs is reproducibility. A setup might be consistent in one lab, but what happens if someone tries to replicate it with slightly different equipment? For your five printers in this lab, how did you handle calibration? I know that was part of the upfront work.Keith Brown:Yeah, that’s a great question. The short version is that there’s always going to be some uncertainty. The most we can do during calibrations is to make sure that what you print is exactly what you intended, every time. We also check for variability across printers.To check the mass, we integrated a simple integral feedback system. Something gets printed, it’s weighed, and if it's under mass, we adjust the extrusion multiplier. That variable lets us account for inconsistencies in filament diameter. That way, we can keep the masses of all structures consistent.As for printer-to-printer variability, we explicitly tested that in our first paper. We compared five different printers using the same structures and didn’t see any statistical differences. That doesn’t mean there couldn’t be differences under other circumstances, but once we corrected for mass variation, there was no measurable difference in our results.That said, there definitely are substantial differences across printer models. If you move from something like a MakerGear printer to a more modern platform like a Bambu printer, which uses a different architecture, you could see differences. But the key is comparing the final structure, its geometry and mass, to make sure it's truly the same thing.Why did you choose Gaussian process regression? (12:40)Charles:Last thing on this topic before we move on. You used Gaussian process regression to drive the optimization. Given your experience now with 30,000 data points, do you think you'd pick a different kind of sampling method? There's been a lot of buzz around reinforcement learning lately. I’d love to hear your thinking on the algorithmic choice.Keith Brown:Great question. Gaussian process and Bayesian optimization, more generally, are excellent when you’re working with small datasets. When you're starting out with 10, 100, even 1,000 measurements, it's a no-brainer. Many papers have followed this path, and it has become a standard approach in the field for this kind of optimization.As our dataset grew, especially toward the end of our campaign with 30,000 experiments, we noticed it was taking a long time just to make the next prediction. I gave a talk about that, and someone in the audience who was an expert in Bayesian optimization was shocked. He showed me that he could do predictions with a million data points on his phone. That led us to modern methods for sparse Gaussian process prediction and training, which make large datasets feasible.Of course, there are some downsides to Gaussian process regression. The standard formulation assumes the function is infinitely differentiable, which limits you to very smooth function spaces. That’s not always realistic. For example, phase transitions are not smooth, and you need other models to capture that behavior accurately. Some researchers have developed hybrid models or alternative frameworks to deal with those cases.Regarding reinforcement learning, we’ve explored it a bit. We’re not experts, but our understanding is that it still boils down to modeling a reward function and a state space. So it ultimately faces the same challenges: how do you model the space? In our most recent campaign, we ran a head-to-head comparison between a neural network and our existing Gaussian process method. The neural net didn’t outperform it. The variance was about the same.At that point, most of the variance was from experimental fluctuation, not model uncertainty. That suggests we already knew the space well enough, and pushing further would just be overfitting. So, at least in the spaces we’ve worked in, we haven’t needed to adopt neural networks. That’s not to say they won’t eventually provide value. They just haven’t been necessary for us yet.Charles:It’s funny that even in 2024, with all the new tools out there, papers are still relying on good old Bayesian optimization and Gaussian processes, even with relatively large datasets.Keith Brown:Well, there’s a funny story behind that. Have you heard about the origin of Bayesian optimization? It’s often called "kriging," after Danie Krige, a South African mining engineer. He was trying to predict how much gold would be in the ground next to a spot where he had already mined. He figured out that you could use kernel functions to make local predictions with uncertainty.That concept of local averaging under uncertainty is the foundation of what we now call Bayesian optimization. So the whole idea comes from mining for gold, which I think is hilarious, and also shows how old and robust the technique really is.What is the role of 3D printers in scientific labs today? (17:30)Charles:That’s a great story. And it’s always interesting to hear how different fields, like mining or geology, have influenced core machine learning tools. Let’s zoom out a bit. We’ve been talking about 3D printers in the context of this paper, but you’ve also written about their broader role in science. How do you see 3D printers in the modern scientific lab?Keith Brown:3D printers wear a lot of hats. In modern labs, they are essential for building flexible infrastructure. My students use them for everything from tube racks to custom fixtures. The ability to quickly design and print something that would have taken hours in the machine shop is a game changer.That flexibility is critical in academic labs where setups change all the time. You have new students, new projects, and new configurations. Printers let you build and rebuild quickly, which is huge.They are also incredibly powerful for exploring mechanics. You get access to an immense design space. Sure, there are limits in size, resolution, and printability, but in practice, the number of structures you can make is effectively infinite. That opens up profound questions in mechanics that just weren't accessible before.And because 3D printing is inherently tied to manufacturability, it makes your discoveries more scalable. If you design a new structure, someone else can reproduce it using a printer. That makes it easier for your findings to be tested by others or even used in industry. Just working with 3D printers builds that translatability in.Charles:Right, and I imagine self-driving labs offer a similar kind of design-for-manufacturability benefit. They're at least more automated than having humans do everything by hand.Keith Brown:Exactly. Automation does help with reproducibility and manufacturability. But that benefit isn’t always guaranteed. Some of our work in self-driving labs involves extremely miniaturized experiments, preparing fluid samples at the picoliter scale or even smaller. That’s not manufacturable. You wouldn’t want to scale that up.Additive manufacturing, on the other hand, can scale. People often scale it by just running many printers in parallel. That works. But there’s one more angle I want to highlight.We’ve focused a lot on the mechanical side of 3D printing, but it’s also an incredible tool for materials development. A printer lets you do chemistry on a voxel-by-voxel basis. You can vary composition, change the polymer formulation, mix precursors — things that usually require huge infrastructure.So it’s not just about printing structures. You can use the printer as a platform to screen processing conditions and material properties at small scales. That makes it a powerful tool for discovering new materials, not just for studying mechanics.Charles:One of our first guests was Sergei Kalinin, who also does interesting work using microscopes as high-throughput experimental platforms. I think you’re describing something similar here, where 3D printers are being used not just to discover structures, but to explore composition too. That’s a really innovative application.What does the open-source self-driving lab ecosystem look like from your perspective? (22:25)Charles: Related to what you were saying earlier about the lab use of 3D printers, I know that a lot of new self-driving labs are built using homemade components, often with 3D-printed parts. I’m curious how you’ve engaged with that ecosystem. I know your group has developed some of your own hardware for self-driving labs. Can you walk us through what that ecosystem looks like to you, especially in terms of how many open-source components other science labs are putting out?Keith Brown:That’s a great question. There is a rich community — and I mean rich in spirit, even if not in funding — around open-source hardware. The goal is not just to make new tools, but to share them so others can reproduce and build on the work. That’s especially powerful in self-driving labs, where everything is custom but needs to be replicable.In our case, we have been of two minds. For our 3D printing robot, which we call the BEAR system — in case I mention that later — we’ve made everything public. I don’t necessarily expect many people to duplicate it exactly, because not everyone has the same experimental goals. But we are helping at least one team who is trying to replicate it. We’ve been sharing code and resources with them.What’s more likely is that people will adopt modular components or subsystems and combine them in their own ways. There are great open-source communities out there. One standout is the Jubilee Project. It looks like a 3D printer, but the tool head can be swapped automatically with others — so you can go from printing to fluid handling to imaging, all on one platform. It’s a very versatile experimental workflow developed by Nadya Peek and collaborators at the University of Washington.That kind of modular, open hardware design has inspired us. We’ve also released some of our own modules. For example, we developed a mechanical testing platform for use with 96-well plates. You can download the files and build it for around $400. I know of at least two being built right now, which is incredibly rewarding. It shows that the time spent on documentation really pays off.The broader self-driving lab community is increasingly built on this philosophy: hardware that works, that you can modify and share. That same ethos is very visible in the software space. A lot of the code is written in Python and shared openly on GitHub. Hardware lags a little behind in that regard, but more people are embracing it. Platforms like Opentrons have done a good job at this — their whole model is to be open and accessible.Charles:Something I’ve always wondered: when I see someone release an open-source hardware module for scientific experiments, I’m glad those communities exist to support and encourage that work. Especially when the tool can generalize across many kinds of self-driving labs. My suspicion is that while open-sourcing designs helps lower the barrier to entry, it still requires a lot of upfront effort. Do you think there’s room for a marketplace where these tools could be produced at scale, maybe commercially? Something more plug-and-play?Keith Brown:That’s a fascinating idea. I think you’re right — there’s definitely a price point where it would make more sense to buy certain components rather than build them yourself.We’ve talked about this in my group. One system we developed is called PANDA, a low-cost fluid handling and electrochemistry platform that we use for polymer electrodeposition. We collaborated with another lab here at BU, and people kept asking for it. We ended up building a few and distributing them to collaborators. Now there are several operating on campus.Harley Quinn et al., PANDA: a self-driving lab for studying electrodeposited polymer films. Mater. Horiz. 11, 1877–1885 (2024). DOI: 10.1039/D4MH00797BSo we’ve asked ourselves: should we spin this out into a company? Could we sell these? Financially, I think there’s a market. But building a hardware startup is tougher than a software one. The upfront costs are higher, and the ecosystem isn’t as mature. But I think it’s worth exploring.If you consider the student labor and time it takes to build an open-source system, even when you follow detailed instructions, it might actually be cheaper to just buy one at twice or three times the cost. Of course, one of the reasons we build things in-house is the pedagogical value. Students learn a lot from assembling, modifying, and understanding how these systems work.So I don’t think the answer is always to commercialize. In our group, building hardware is part of what we love to do. But from the perspective of expanding impact, it would be amazing to say, “If you want this, here’s the catalog number — we’ll ship it to you.” That would be very exciting.Charles:It’s really interesting that there’s already this latent demand, and your lab is kind of a mini factory distributing this equipment.Keith Brown:I wouldn’t call it a factory, for legal reasons. We don’t sell anything. We co-develop instruments with collaborators and share them.Charles:Fair enough. I guess it’s not exactly venture-backable either. The market for scientific equipment is small, and as you said, rich mostly in community and spirit. But maybe something for a funder to think about if they’re listening.Will self-driving labs be more centralized in large facilities or decentralized across many small labs? (29:40)Charles: Moving to a broader question: do you think self-driving labs will become more centralized — like large shared facilities, or remain distributed, where labs build and run their own setups? I know it’s a bit abstract, but I’m curious where you think the field is heading. Is it currently leaning more toward centralized development in large labs with significant capital, or more toward distributed boutique systems?Keith Brown:That’s a big debate in the community. If you look at the literature, most papers that report results from self-driving labs still come from smaller, decentralized setups. There are a few large efforts in the US and globally, but many are still getting off the ground. These are major infrastructure investments, tens or even hundreds of millions of dollars.In contrast, moving from running experiments to automating those experiments within a research group is a pretty natural progression. The students already understand the experiments, the equipment, the edge cases. I worry that in a fully centralized model, students might plan an experiment, submit it to a facility, get results back, and not understand what went wrong — because they never did the experiment themselves.People often draw analogies to centralized computing clusters, but the difference is that with computing, you can test the code on your own machine. If you can’t run any part of the experiment in your lab, you’re stuck interpreting black-box results. That’s not ideal.Also, a lot of the innovation in self-driving labs is coming from people who are building the systems themselves. Most scientific instruments today are built for humans — microscopes, pipettes, everything. But the optimal scale for experimentation is often smaller, faster, and more precise. This is something a robot can handle better than a human. If we want to change the format of scientific infrastructure, we need to be experimenting in labs, not just in centralized hubs.That said, there is definitely room for shared resources, especially for specialized processes. We’re actually turning our own BEAR system into a kind of centralized service. So if a lab wants to study mechanics but doesn’t have the capacity to do thousands of tests, they could send their samples to us and get consistent, high-quality data in return.So there’s a spectrum. Decentralized labs can evolve into shared services. You don’t need a $100 million facility to make that happen. The field doesn’t need to be concentrated in a few elite locations. It should be spread across the whole country, and the whole world.Charles:Earlier, when we talked about all these different labs building their own equipment, it reminded me of the early days of computing. Back then, the first electronic computers were built by university groups, and each group had its own unique design. That led to a lot of innovation and a tight coupling between computing and scientific research.Today, we think of supercomputers and compute clusters as centralized resources. But even then, each university usually has its own compute cluster, and some labs have their own. There’s a kind of hierarchy of flexibility and scale.I think that’s a helpful metaphor for self-driving labs. In some ways, they're like compute clusters, or maybe like simulation software such as VASP or DFT. There are groups that focus entirely on DFT, but it’s also a tool that any group can use. It scales from a laptop to a supercomputer. I feel like that’s one of the key questions: how do we conceptualize self-driving labs? I think your example helped clarify that a lot.Keith Brown:Thanks. And I will say that our 3D printing system is, in many ways, one of the easiest examples to understand. You could walk into a middle or high school and find a 3D printer. If you told students, "Design something with mechanical properties—maybe it can hold your weight, but not two people’s," they could understand that. They can physically hold it, design it, and see it tested in a central facility.That kind of tactile connection makes it much easier to understand than, say, a chemical reaction, which requires a bit more abstraction. That extra layer of abstraction makes it harder to communicate.What does human-machine collaboration look like? And, how do you train students to work in this field? (36:10)Charles:That leads nicely into a topic I wanted to explore, namely how humans interact with self-driving labs. You ran a campaign with 25,000 samples. Because of the broader AI discourse, I think people often imagine self-driving labs as automating science entirely. For that campaign and more broadly, how do you think about the role of the human researcher? What does human-machine collaboration look like?Keith Brown:That’s a great question and a very active area of research.First off, there are no labs today that are truly self-directed. Just like self-driving cars still need a human to say where to go, self-driving labs still need people to define goals, craft experiments, and choose parameters. All of that is still set manually, especially at the beginning of a campaign.That works well when the campaign lasts a few days. But during our multi-year campaign, we realized something interesting. If the system is running for weeks, you start to notice it making choices that may or may not align with your priorities. So every week, we would check in, usually it was me and Kelsey, the graduate student leading the project, and we would review what the system was doing and whether we should tweak anything.We weren’t in the loop, because the lab could run without us, but we were on the loop. We made decisions when we wanted to. That style of interaction will become more common as these campaigns get longer. You’re not just running 100 benchmark experiments; you’re searching for a new superconductor or a new lasing molecule over months or even a year.In that context, the self-driving lab becomes another voice in the research group. And for the human researchers, it can be a more elevated experience. Kelsey, for example, had an experience much more like a PI than a typical graduate student. Instead of running individual experiments, she was thinking about learning strategies, campaign design, and data interpretation.It was intellectually enriching. We explored optimization theory, machine learning, and human-computer interaction. We even wrote a paper about it called Driving School for Self-Driving Labs. The analogy was that these systems are like autonomous cars—not quite fully self-driving, but advanced enough to require new modes of engagement from the human operator. We wanted to document what those interactions look like and the decisions people still need to make.Kelsey L. Snapp et al., Driving school for self-driving labs. Digit. Discov. 2, 1620–1629 (2023). DOI: 10.1039/D3DD00150DCharles:That’s a great example. It really elevates the abstraction level for graduate students. Instead of spending all their time running individual experiments, they can focus on campaign design and data interpretation.That leads to a broader question: how do you train students to work in this kind of lab? The skill set seems to be expanding. Are there specific qualities you look for when recruiting grad students? And how do you help them build the combination of hardware, software, AI, and domain expertise that’s now required?Keith Brown:Great question. The number one trait I look for is tinkering. I want students who build things in their spare time — woodworking, 3D printing, electronics, coding — anything creative and technical. That shows a willingness to pick up new skills and apply them.Once someone has that mindset, it’s easier to help them integrate those skills into research.In terms of training, it’s definitely a challenge. Education is, in some ways, a zero-sum game. If you want someone to learn machine learning and robotics, something else has to give. But you can’t sacrifice core domain expertise. If a student is studying mechanics, they need to understand mechanics deeply. The same goes for materials science, chemistry, whatever the core application is.That said, there absolutely needs to be room for statistical learning, machine learning, and optimization methods like Bayesian optimization. These should be taught at every level, from high school through graduate school. They are foundational skills across disciplines and are not taught widely enough yet.Even simple machine learning techniques can be introduced with great tutorials. The deeper subtleties, like choosing and tuning hyperparameters, only come with experience. I can’t count how many times I’ve had a conversation with a student who says their model isn’t learning, and it turns out they haven’t touched the hyperparameters.There’s a lot to learn, but a lot of it can be learned through doing research experience and hardware. I think I’ve been lucky being in a mechanical engineering department. A lot of the students I work with are naturally inclined toward hardware and have training in robotics and physical systems.The easy answer to building interdisciplinary teams is to collaborate—bring in some CS folks interested in AI and hardware folks from mechanical engineering. In principle, that’s great. But at the end of the day, everyone still has to write a thesis. So it’s not that simple. You can’t always fund a large team. If you’ve got a small team, people need to wear multiple hats.Everyone on the team needs to be proficient enough in all these areas to hold conversations and understand trade-offs. There's also a lot of room for expanding skill sets through non-traditional experiences: bringing hobbies into the lab, watching YouTube videos, or running through lots of Python tutorials. A lot of learning can come from just doing.Charles:Yeah, I think it's great you mentioned earlier that you look for graduate students who are tinkerers. It really feels like we’re reviving the spirit of tinkering in graduate school, which is kind of what it was originally about.What do you think AI will be the last to automate in your field? (44:45)Maybe one last question we sometimes close with. What do you think is the last thing AI will be able to accomplish or automate in your field?Keith Brown:Right. The funny thing about AI and computation is that what machines find hard is often different from what we find hard. A calculator can instantly find the square root of a huge number, but it used to struggle with identifying whether a photo contained a bird. So I think there's a mismatch between what’s human-hard and what’s computer-hard.In my field, mechanics and materials discovery, I think some of the most difficult challenges will be what we call "out-of-distribution" events. These are situations where evidence conflicts, and you need a new mental model to make sense of it. Think of paradigm-shifting discoveries like the photoelectric effect or the heliocentric model. Those moments require not just data, but new frameworks.AI will likely struggle with that for a long time. And frankly, people do too. It takes millions of scientists to have a single breakthrough moment. That’s the kind of synthesis that’s still extremely difficult.That said, there are things we consider hard — like reviewing a technical paper — that AI might actually be good at. Maybe not judging novelty, but certainly evaluating technical soundness. Hopefully, we can use AI to amplify our ability to parse massive amounts of knowledge and make better decisions faster. But that final leap, from mountains of data to a new paradigm — that’s going to remain challenging for AI.Charles:Awesome. All right, Keith, thanks so much for joining us.Keith Brown:Thanks, Charles. This was a really fun conversation.
undefined
Jun 11, 2025 • 0sec

Building an AI-Powered Grid with Kyri Baker

Kyri Baker, a professor at the University of Colorado Boulder, shares insights on integrating AI into power grid systems. He discusses why smaller, faster AI models could be more effective than massive ones for optimizing energy flow. Baker emphasizes the challenge of outdated institutional practices over technical issues in grid management. He also advocates for a rebranded, enjoyable approach to decarbonization, addressing misconceptions about AI's climate impact. Tune in for a lively discussion on making grids smarter and more sustainable!
undefined
May 27, 2025 • 0sec

Shantenu Jha on Why Fusion Is a Computational Problem

IntroductionIn this episode, I sit down with Shantenu Jha, Director of Computational Science at Princeton Plasma Physics Lab (PPPL), to explore how AI is reshaping the path to fusion energy. We discuss why PPPL views fusion as not only a physics problem but also a grand computational challenge, what it takes to close a 10-order-of-magnitude compute gap, and how reasoning models are being integrated into experimental science.Shantenu also shares lessons from a recent AI “jam session” with over 1,000 DOE scientists, explains the emerging need for benchmark datasets in fusion, and reflects on what AI might never fully automate. Here are three takeaways from our conversation:AI is transforming fusion from a physics-first to a compute-first challengeFusion research, particularly tokamak and stellarator design, demands simulations of extreme conditions: nonlinear, turbulent plasma under hundreds of millions of Kelvin. Shantenu frames this not just as a physics challenge, but as a computational design problem that’s at least 10 orders of magnitude beyond current capabilities. Bridging that gap isn’t just about hardware; it's about smarter, AI-assisted navigation of parameter space to get more insight per FLOP.Bottom-up AI models offer more control and trust than giant monolithsWhile large AI models show promise, Shantenu argues that smaller, physics-constrained models offer tighter uncertainty control and better validation. This ensemble approach allows fusion scientists to integrate AI gradually, with confidence, into critical design and control tasks. In short, building up to larger models rather than jumping in all at once is the best approach.Fusion needs its own benchmarks and the community is respondingUnlike fields like materials science or software, fusion lacks shared benchmarks to evaluate AI progress but that’s changing. Shantenu and collaborators are developing “FusionBench” to measure how well AI systems solve meaningful fusion problems. Combined with cross-lab efforts like the recent AI jam session, this signals a shift toward more rigorous, collaborative, and AI-integrated fusion research.Transcript Charles: Shantenu, welcome.Shantenu Jha: Charles, pleasure to be here.What is the Princeton Plasma Physics Lab, and how did it begin? (00:40)Charles: Could you talk a little bit about Princeton Plasma Physics Lab (PPPL) and its place within the Department of Energy’s (DOE) national labs? I was surprised to learn that PPPL is part of that system, and I imagine others might be too. It’d be great to hear about the lab’s history and what it does.Shantenu Jha: Definitely. It's good to hear we're one of the DOE’s best-kept secrets — hopefully with a lot of "bang" to share. The Princeton Plasma Physics Lab has been around for at least 70 years, maybe longer, under various names. It actually began as Project Matterhorn, going back to the time of theoretical physicist and astronomer Lyman Spitzer and others like physicist John Wheeler. It started as a classified effort focused on thermonuclear reactions and fusion, primarily from a Cold War perspective. Over the years, it transitioned from weapons work to peaceful applications of fusion.The lab has had national lab status since the late 1960s, and fusion — particularly, magnetic fusion — has been its primary mission ever since. PPPL is the only one of the 17 DOE national labs focused almost exclusively on fusion and plasma science. But like all the national labs, it’s part of a larger ecosystem and collaborates widely. Increasingly, we're also doing a lot of work in computation, which we’ll probably touch on more later.Why is fusion a computational problem? (03:20)Charles: That’s fascinating. I didn’t realize it dated back that far. Like many national labs, it has its roots in the Cold War and the Manhattan Project. Let's talk about AI. How does AI fit into fusion projects like tokamak design? What’s the role it plays, and what's the opportunity?Shantenu Jha: That’s a great question. Just to be clear, this is a biased perspective coming from computing. A theorist might say something different. I see fusion as a grand computational challenge. Think of it like drug design or material discovery. You’re trying to design something under a set of constraints, which makes it a computationally expensive problem. The parameter space is huge, and some calculations are prohibitively costly.Designing a tokamak or a stellarator isn’t just about building a complex machine. You're building one that has to sustain temperatures of hundreds of millions of Kelvin, containing a highly nonlinear, charged, and often turbulent fluid. So you're not just solving a design problem; you're tackling layers of physics at once. That’s why I consider it a computational challenge.If we had infinite computing power, we could simulate everything and design our way forward. But we don’t and probably never will. I’d estimate we’re about 10 orders of magnitude away from the computational capacity we need to make this a fully simulation-first problem. So the question becomes: how do we close that gap? Every day, I come to work thinking about how to achieve those 10 orders of magnitude in the next five to seven years.What does it mean to be 10 orders of magnitude away in compute? (07:20)Charles: That makes me wonder: what does it actually look like to be that far off in compute? Are we talking about limitations in time steps, model resolution, number of parameters?Shantenu Jha: All of the above. And I’d add that just having more compute isn’t enough. If you don’t use it intelligently, you’ll hit another wall. We’re not going to get 10 orders of magnitude from hardware improvements alone. Moore’s law, which predicts a doubling of performance roughly every 18 to 24 months, only gets us so far — maybe a 1,000x improvement in a decade.So we have to use computation more intelligently. For example, not every simulation needs the same time step or resolution. Not every region of parameter space deserves the same computational effort. We need to prioritize smarter, use AI to identify which parts of the space are worth exploring, where we can save time, and where we can afford lower fidelity.This is where I think AI fundamentally changes things. It’s not just about speeding things up. It’s about getting more value out of the same computational budget.What kind of computing power does PPPL currently use and where does it come from? (10:00)Charles: What kind of computing resources are you using now? What's the scale, and where’s it coming from?Shantenu Jha: The leading system we’re using is Frontier at Oak Ridge, which is the DOE’s flagship machine. It has a peak performance of about 1.4 exaFLOPS. But real applications never reach that. As they say, that number is what you're guaranteed not to exceed. If a code achieves even a quarter or a third of that, it's doing extremely well.The challenge is getting these high-fidelity physics codes to run well on these leading machines. We’re also using other DOE machines like those at Argonne and Livermore but the effort has primarily been on Frontier, raising interesting questions since all those computers are within the DOE’s portfolio today. And we have to prepare for the next generation of supercomputers over the next five to ten years. That’s something I’m deeply interested in.Charles: When it comes to AI and high-performance computing (HPC), some might wonder: why not just train one big model on all your simulation data and use it for fast inference? Why the need for a heterogeneous system?Shantenu Jha: The answer is: yes, maybe eventually, but we’re not there yet. Right now, we’re taking a hybrid approach. We're looking at simulations and seeing where more computation doesn’t yield more accuracy. That’s a good candidate for a surrogate model, a spot where AI can help.In some views of the future, all simulations will be made up of lots of small AI models. Maybe you build one big model from smaller ones, or maybe you train one massive model up front. It’ll probably be a mix of both.At PPPL, we’re exploring the bottom-up path. We’re running multi-fidelity simulations and inserting AI surrogate models where we can. The goal is either to do more science with the same compute or to reduce the cost of getting the same results.This could look like super-resolution or bootstrapping. Start with a cheap model, refine it with AI, then move up in fidelity as budget allows. Whether this builds into a giant, all-encompassing model is still an open question. But yes, for now, it's a stack of AI "turtles" all the way up.Why build a bottom-up ensemble of small AI models and what are the tradeoffs? (15:15)Charles: Give me a sense of why we might expect a bottom-up ensemble of many small AI models. Why wouldn’t we just use a single large one? Is it because you're working with different types of modules or physics? Help us understand that tradeoff.Shantenu Jha: Absolutely. That’s exactly right. When you train one very large model, the uncertainty is typically higher. These large models can exhibit emergent behavior, and we all know about issues like hallucination and unpredictable errors. In contrast, if you start with small models and constrain the physics at each step, the uncertainty is much smaller — or at least more manageable.You can train, test, and validate at every stage, which gives you greater control over uncertainty. That’s one reason I personally prefer building a hierarchy of models. Eventually, yes, we want large, powerful, emergent models. But from my perspective, it’s more effective to build confidence gradually rather than create the biggest model possible and then try to understand its limitations.How can we trust these models in chaotic, real-world systems like fusion reactors? (16:30)Charles: One thing I’ve always wondered: plasma physics is fundamentally chaotic. As we try to control plasma fusion in reactor designs like tokamaks, how can we have any guarantee that a given model or control system will continue to work reliably over years of operation? That seems like a major issue when moving from lab to real-world deployment.Shantenu Jha: I couldn’t agree more. Perpetual reliability is going to be difficult. This is where continuous learning and training come in. As we build digital twins or AI-driven models of tokamaks, those models, like humans, will need to be continuously updated. Just like reading a few new papers each morning to stay current, these models will need to be retrained regularly using high-quality data.This already happens with large language models on the internet, where huge volumes of new data — ranging in quality — are continuously fed into updated versions. That feedback loop is easier online, but in plasma physics, we’ll need a similar mechanism based on experimental systems and high-fidelity simulations.Eventually, we’ll run into data scarcity, both in physics and online. At some point, the best training data may come from AI-generated outputs — synthetic data. This raises interesting questions: how do we generate useful synthetic data for the next generation of models? It’s a growing area of research.Charles: What does synthetic data look like in plasma physics? What makes it useful?Shantenu Jha: It depends on how you define synthetic data. There isn’t really a consensus. For example, if data comes from an AI system that was constrained by physical laws, some would still call that synthetic. Personally, I take a more flexible view. If a model uses physics-informed constraints and the resulting data comes from inference within those bounds, I think it’s acceptable to use that data for training. But others might disagree. It’s still a bit of a gray area.Charles: Going back to the earlier point: how do we operate real systems when we can’t fully guarantee reliability? You mentioned active learning and continuous training, which makes sense. But what does deployment look like in practice? Do we just run simulations and physical tests over time and then say, “well, nothing has broken yet, so it must be safe”?Shantenu Jha: That’s an important question. I think the answer lies in bounding our uncertainty. Think about data centers: some guarantee 99% uptime, others promise 99.9% or even more. That extra fraction comes at a significant cost. Similarly, in fusion, it won’t be about total certainty. It’ll be a balance of technical capability, design tolerances, and economic tradeoffs.So no, we won’t be able to provide absolute guarantees. But we will aim for high confidence — enough that our computational models and AI-assisted designs operate within acceptable risk thresholds. It becomes a matter of how much certainty is “enough,” and that will differ depending on the application. I don’t think anyone will insist on total guarantees, especially in a field as complex as fusion.Are there benchmarks in fusion like in other scientific fields? (21:45)Charles: It’s an interesting contrast with the nuclear fission industry, which has had strict regulatory frameworks for decades. Fusion seems to raise different questions around knowability. You mentioned data earlier. In many fields, benchmark datasets help drive innovation. Is there anything like that in physics or fusion?Shantenu Jha: That’s a great question. It’s something we’ve been actively working on. Some communities, like materials science or math-heavy domains, have developed strong benchmarks. Even in machine learning for software or math reasoning, benchmarks help communities track progress and compare results without ambiguity.The fusion community hasn’t really done this yet. That’s been one of my personal goals: working with experts in fusion to define something we’re calling FusionBench. We’re still in the early stages, so I don’t have results to share yet, but we hope to launch something in the next few months.The idea is twofold. First, we want to measure how much progress we’re making in solving meaningful fusion problems. Second, we want to track our improvements in applying AI to fusion, something the field hasn’t systematically done before.As new models are released — and they're arriving rapidly — they may be well-suited for certain tasks, but that doesn’t necessarily make them appropriate for the challenges fusion presents. A benchmark helps us calibrate our progress in using AI models, but it also helps differentiate which of the new models are actually effective for our domain.It’s about making sure our community is aligned: using the right models with the right capabilities to move the science forward. There are many reasons why something like FusionBench is valuable. Just as the Frontier Math benchmark has been useful for the mathematics and reasoning community, we believe FusionBench will serve a similar purpose for fusion.What happened during the recent AI scientist jam session? (24:50)Charles: Awesome. I’m excited to see it. It's a great point that many labs are now shifting to tougher scientific benchmarks because the easier ones have been saturated. It’ll be interesting to see how these models perform on a Fusion benchmark. You recently co-hosted an AI scientist jam session with nine national labs, 1,000 scientists, and support from OpenAI and Anthropic, who made their models available for a day. How did that go?Shantenu Jha: It was fun. We learned a lot. We gained insights into our own limitations and saw firsthand the capabilities of the models provided by OpenAI and Anthropic.One major takeaway was the sheer diversity of problems. We had around 1,500 scientists from across the DOE complex, each bringing different ideas. We’re now in the process of aggregating what we learned from all the labs and doing a meta-analysis. We hope to publish something soon.It was incredible to see which problems the AI reasoning models helped with most effectively. That alone was valuable not just for us, but hopefully for the model developers too. The second big takeaway is that while AI models won’t replace fusion scientists, it’s now broadly accepted, even among the skeptics, that these tools are genuinely useful.That doesn’t mean we can apply them indiscriminately. They won’t be useful for everything. But used carefully, they can be powerful assistants. That’s the shift we’re seeing now: recognizing the value and figuring out how to use it most effectively.Charles: That’s really interesting. Getting 1,500 people together is no small feat. Do you feel there’s still skepticism toward these reasoning models in the fusion community?Shantenu Jha: Yes, there’s a healthy level of skepticism and critical thinking, as there should be. I think most people now understand this isn’t just a fad. There’s real scientific value here.The key is to develop a nuanced understanding of where these models are useful and where they’re not. That boundary isn’t fixed. It’s a moving target. As the models improve and as we get better at using them, the line between "useful" and "not useful" will shift. Our job is to keep pace and use them to enhance scientific discovery. I think the community is starting to embrace that.What’s the hardest task for AI to master in your work? (29:09)Charles: One last question. What do you think will be the hardest — or the last — task that AI will become truly expert in within your daily work?Shantenu Jha: Great question. If you’d asked me a month ago, I would have given you a different answer. Back then, even if you promised me 10 orders of magnitude more compute, I would’ve said we still wouldn’t have AI models capable of abduction—the intuitive leap that lets scientists form new ideas.But then I attended a meeting in Japan co-hosted by the DOE, focused on post-exascale computing. During a brainstorming session, I had this thought: what if future AI models are capable of rejecting the algorithms they were given? Not in a dystopian sense, but what if they have the intelligence to identify when a better algorithm exists?In other words, what if they can learn how to learn? If AI can autonomously select the best algorithm for a given scientific problem, that’s a huge leap. That’s what scientists do: choose and tailor the right method. If AI can do that, it would be transformative.So for me, selecting the right algorithm for a problem remains the hardest challenge. But with enough computational power — 10, maybe even 20 orders of magnitude more — it could also be the ultimate achievement from a computational science perspective.Charles: Yeah, that’s fascinating. So if anyone from Congress is listening: we need to get 10 more orders of compute for PPPL if we want fusion.Thanks for joining us, Shantenu.Shantenu Jha: Thank you, Charles. It’s been a pleasure.
undefined
May 13, 2025 • 0sec

Shelby Newsad on the Venture Thesis for Autonomous Science

Introduction In this episode, I sit down with Shelby Newsad, who invests in early-stage deep tech at Compound VC, a venture firm built around thesis-driven research, to discuss how AI is changing the way we discover and commercialize new therapeutics and materials.We talk about the limits of in silico research, what makes a data or hardware moat defensible, and how Compound thinks about value capture in autonomy-driven platforms. Shelby also shares lessons from the recent Autonomous Science Workshop she organized, international clinical trial strategies, and why she wants more founders building in biosecurity. Here are three takeaways from our conversation: Computer models are improving, but scale matters:AI models can now predict small molecule behaviors with near-experimental accuracy, offering major efficiency gains. However, for larger biomolecules like protein complexes, startups still need to generate experimental data, often via cryo-EM or wet lab work, to train and validate models effectively.Hardware is emerging as a durable competitive edge:Startups are pushing biology into the nanoliter scale using patented silicon wafers, enabling more compact and efficient experimental systems. This kind of hardware innovation underpins self-driving labs and creates durable IP moats that pure AI models often lack.Geographic and regulatory arbitrage is shaping biotech strategy:Rather than relying solely on U.S. trials, companies are strategically sequencing studies across jurisdictions to reduce costs and speed timelines. These moves, combined with FDA flexibility on foreign data, help de-risk development while keeping U.S. commercialization pathways open.Transcript Charles: Shelby, welcome to the show.Shelby Newsad: Great to be here. Thanks for having me.What makes Compound VC’s thesis-driven model unique? Charles: I want to talk a little bit about Compound because it's fairly unique as a thesis-driven VC, which I find really more my flavor of actually engaging deeply with a research area. I'm curious, why do you think there aren't more thesis-driven VCs?Shelby Newsad: That's a good question. I think it just takes a lot of time and it's just a really different structure to doing investment. Instead of being really numbers-focused and casting a wide net like a lot of VCs do, and how a lot of VCs train you to be, we intentionally take a step back from the noise and spend probably about half of our time on research and speaking to people that are academics. We use that lens to decide where we want to cast our more specific nets and investments in deep tech.How do you explain the role of AI in transforming scientific discovery? (01:30)Charles: That's awesome. And I'm excited to talk a little bit more about the investment side of the thesis, because we talk a lot to researchers who are doing really awesome stuff with AI in the lab, but it might look a little bit different out in the marketplace. So excited to jump into that a little bit more. But first maybe a more abstract question. I think a lot of people are still trying to grasp the conceptual metaphors for how to think about the role of AI and autonomy in science. We had things like the discovery of the microscope and how that changed the way that we do science. I'm curious, do you have ways of explaining or capturing how you see AI transforming scientific discovery, scientific research, and the scientific enterprise overall? One common metaphor is discovery-as-a-service, but I’m curious if you have others that you use.Shelby Newsad: Yeah, discovery-as-a-service is really interesting and something target discovery companies actually service in drug discovery. There are interesting deals for structuring that. But yeah, I guess in drug discovery I think there's golden eggs to be found in population scale biobanks. And yeah, we have a company, Fearon, that aggregates population scale data and does target enrichment, helps with patient stratification. And their big learning is that when you're able to aggregate five million pairs of genotypes and phenotypes, you can actually predict disease onset for like 700 different diseases. You can have really discrete timelines for when adverse events happen.I think a lot of the AI field is actually further advanced than what people in big industries and people buying services realize. I think a lot of people would benefit if they spent maybe one to two hours a week just reading AI papers to learn how much they can accomplish today. That would change the business model dynamics that we've seen companies struggle with.What are the tradeoffs between in silico and experimental methods? (04:15)Charles: Yeah, another lens I've been thinking about for autonomy, apart from personalization which has kind of been in vogue for a while, is abundance. Like you can have an abundance of data about yourself that you couldn't have before. But I think finding a needle in the haystack in another metaphor, like finding golden eggs. Perhaps another one of the roles that AI plays when we think about scientific discovery. To your earlier point about finding these golden eggs and the role AI plays, I would love to hear you walk through the role of pure, in silico companies that are using AI for in silico discoveries. To me, it feels kind of hard to build a moat around that. I’m curious how you all think about that versus experimental and hardware-driven approaches.Shelby Newsad: Yeah, I think the great thing about working in venture capital or even being an employee at a few startup companies is that you're able to see a trend and then make investments on either side of purely in silico work or experimental approaches, and capture a data moat or a hardware moat for your company. That’s something we think about a lot at Compound. On the purely in silico side, there are some interesting models with neural network potentials where the accuracy of kilocal per mole predictions is actually within experimental range. That’s something I haven’t seen other models achieve. But it's only for small molecules that you see neural network potentials reaching that accuracy. For larger biomolecules, there’s still a need for experimental data. At what scale you need that is where we’re seeing a lot of companies differ.I’m personally very interested in companies like Gandeeva Therapeutics and Generate:Biomedicines that are doing a lot of their own cryo-EM structures of proteins and trying to scale those structures to create their own proprietary protein databases, then using that to train better models. I’ve seen both approaches. The consensus answer is that you will need data for a long time. The non-consensus answer is that for small molecules interacting with confined protein regions, the data isn’t as necessary in the next three to five years. But for larger molecules, protein complexes, or cells, the need for data is still very much present.Charles: That’s interesting. It sounds like what you're sketching out is that from gene therapies and small molecules up to large biomolecules, there are different levels of granularity available through in silico methods, compared to the resolution you may or may not get with experimental. And that trade-off determines when in silico provides a substantive value-add. Is that what you're saying?Shelby Newsad: Yeah, well put.Charles: That’s a really interesting way to think about it, especially not just in bio but across different fields: when in silico methods make sense versus when you need something like self-driving labs. I know we’ve talked about that a lot. I guess the flip side of the in silico question is when is hardware a moat, and what kind of hardware is a moat?When does hardware become a durable moat in scientific innovation? (08:15)Shelby Newsad: Yeah, we've seen various different companies that are creating IP around different microfluidic systems where they can engineer cells on a chip. Twist Bioscience actually has on their silicon wafers and in their patents that they can get down to nanoliter-range droplets, which is pretty incredible because most biology is done in the microliter range. The fact that they’re able to go down a few orders of magnitude is amazing. I wish there were more hardware people interested in biology because there’s a need for better systems of moving liquids around.Something we need to talk more about with self-driving labs is: what if these labs are really just silicon wafers or CMOS chips where we move liquids around with currents and do reactions inside them? Maybe that can all happen in a two-foot by two-foot box on a benchtop, instead of automating a whole lab space.Charles: For a lot of these microfluidics, the key value add is high throughput. AI and autonomy become a wrapper around that — or around the models — but the data moat is being generated by a new kind of hardware. That’s how I think about that kind of play, and I agree it’s really durable, especially if they can map those results into proxy variables used in clinical trials.Shelby Newsad: I was just wondering how you think of hardware versus chemical versus AI moats for companies building in 2025?Charles: Everyone is looking for more hardware people. I think hardware is a more durable moat, especially with so much churn in AI. That’s generally where I lean, which is again why self-driving labs matter not just as a hardware play but because the hardware enables you to generate experimental data. Microfluidics is interesting because it’s not just that it generates data. It’s also discovering new therapeutics through a different mechanism. That makes me more bearish on AI as a whole and more focused on autonomy.Shelby Newsad: What company do you find most exciting in the hardware, materials, or chemistry autonomy space?Charles: I think it’s great as a technology for R&D. It’s great for discovering new materials or therapeutics. But once you discover something, there’s a whole traditional pipeline. How do you commercialize it? The benefit of these autonomy technologies is that they speed up the front of the discovery funnel. But it’s not clear they capture value in the same way.How do you think about value capture in AI-driven discovery platforms? (11:55)So the question I wrestle with is: how do you capture value for autonomy? You generate a lot of data and find the needle in the haystack, but how do you keep the full value instead of spinning it off? For personalized medicine, there’s a clear investment case. For chemicals and some therapeutics, it’s less clear that you can fully capture the value.Shelby Newsad: Yeah, we’re definitely taking the asset approach for a lot of our companies. A lot of them are platforms that capture value by creating assets and bringing drugs to clinics. Others try to license their golden eggs to pharma, land and expand inside bigger companies, and grow pilot contracts into ones that include royalties and milestone payments.But I get the complexity. It’s not naive optimism. AbCellera is the canonical platform company that has dozens of antibodies in the clinic through partnerships. When they went public, public market investors wanted proof. So now they have their own pipeline. Previous platform companies like Nimbus Therapeutics had to bring a drug through Phase One before they got nine-figure contracts. So we’re not naive about how hard it is to change business models, especially in entrenched industries. But it’s worth experimenting, if golden eggs are materially easier to find and it doesn’t take six months but maybe just an afternoon, that changes the cost structure and who your customer base could be. It’s worth trying to build companies around that.What’s the difference between materials and biologics in terms of value capture and discovery? (15:00)Charles: Yeah, now is definitely the time to try. I’m also curious whether you see differences between materials and biologics in the discovery process and value capture. You mentioned royalties and partnerships, which are great in bio. But in materials, with different incumbents and capital flows, it’s less clear. Have you looked at material discovery, and how does that differ?Shelby Newsad: Yeah, we have. One issue is that there are far fewer large chemical companies, and their margins are much lower than in pharma. So when they adopt new chemicals or improve processes, they demand really strict techno-economic analyses even at early stages. Even then, it resembles a pharma sales cycle. What’s more interesting in materials is looking at industries that need new chemicals but aren’t Dow or the big plastic companies. For instance, new chemicals are needed for data center technologies and infrastructure buildouts.How does China’s faster clinical trial process affect U.S. biotech competitiveness? (17:00)Charles: That makes sense. So for greenfield discoveries, it could really change the cost structure. Speaking of cost structure, how does US competitiveness stack up against China? There’s a lot of reporting that trials in China move faster. While the US bets on AI at the discovery stage, can China move faster through clinicals? That’s a real concern. I know some VC investments have been wiped out by Chinese competition.Shelby Newsad: Definitely. A lot of the licensing deals for pharma molecules have been me-too molecules, not net-new. That gives us confidence that the US is still the center of innovation in biotech. The US’s role might also be advancing new modalities like better cell therapies or disease-specific biologics.With the FDA changing animal testing rules, companies can do early trials in places like China where investigator-initiated trials can support IND applications in the US. That can materially de-risk programs. I see it as more of an opportunity than a threat. Companies like Insilico Medicine are doing that. They started with a first-in-human study in Australia for its clinical trial rebates, did Phase One in New Zealand for cost reasons, and are now in Phase Two in China. I just spoke with a founder yesterday who wants to do trials in Spain, where Western medicine is accepted but trials cost about one-sixth what they do in the US.Charles: Sounds like there’s regulatory arbitrage. Between repurposing and new modalities, that’s where the advantage lies?Shelby Newsad: Exactly.What were your main takeaways from the Autonomous Science Workshop? (20:00)Charles: You all helped organize the Autonomous Science Research Day. What were some takeaways?Shelby Newsad: First, thank you for speaking. Your talk was excellent. The fact that your work pre-A Lab helped lay the foundation for the autonomous lab at Berkeley is incredible.Charles: I promise I wasn’t fishing for compliments.Shelby Newsad: The biggest takeaway is where we see AI actually influencing outcomes. That applies across chemistry and materials, protein design in Phil Romero’s group, and cell-level work at Jure Leskovec’s lab. His lab is doing perturbations and agentic workflows for lab automation. These papers get Twitter hype, but the upshot is that siloed fields now see use cases for autonomy at every level of chemistry and biology. Speakers made that point clear, from capturing visual data to building smart cages for scaled animal research.Another insight is that some of this tech can be commercialized now, not five years from now.Charles: For me, Olden Labs’ work on smart cages and autonomous science plus animal models was a new angle. I hadn’t thought much about that, but it’s exciting. Value capture aside, it’s clear autonomy will unlock scientific discoveries and R&D. Are there areas you wish more people were building in?Shelby Newsad: We really wish more people were working in biosecurity. With measles outbreaks, avian flu in people without bird exposure, and long-term viral effects on neurodegeneration, it's clearly a longevity and national security issue. People are already using autonomous science to work with evolved pathogens. We’ve seen BARDA show some interest in pan-viral vaccine platforms. There’s room to position biodefense in ways this administration might support. I'm really excited to see more people build in this space.Charles: Awesome. Thanks so much for joining us, Shelby.Shelby Newsad: Exactly, yeah. Great. Appreciate you having me. Speak soon.
undefined
8 snips
Apr 29, 2025 • 0sec

Sergei Kalinin on AI & Autonomous Microscopes

In this insightful discussion, Sergei V. Kalinin, chief scientist for AI in physical sciences and professor at the University of Tennessee, shares how AI is revolutionizing microscopy. He reveals how autonomous microscopes are evolving from static imaging to atomic-scale fabricators, enabling new manufacturing possibilities. Kalinin also highlights the balance between machine efficiency and human intuition in scientific research. Additionally, he discusses the exciting impact of digital twins and collaborative hackathons in the microscopy community.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app