#052 Don't Build Models, Build Systems That Build Models

Jul 1, 2025

Dive into the fascinating world of AI infrastructure with insights on building adaptive systems rather than just fine-tuned models. Explore the shift to serverless platforms and the critical role of task decomposition in model performance. Discover why inference is where the real money lies and get an understanding of GPU versus CPU processing challenges. Learn about optimizing MLOps with advanced integration patterns and improving data processing pipelines for efficiency. The conversation wraps with thoughts on enhancing community engagement in the AI development landscape.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

INSIGHT

Serverless Enables Ambitious Scale

Modal enables spinning up 100 GPUs quickly, encouraging ambitious scaling in AI workloads.
Serverless infrastructure changes economics, letting teams rent compute for brief tasks without overhead.

ADVICE

Monetize Model Inference

Focus monetization on model inference integrated within software, not on training models.
Deliver value by combining model outputs with applications and tools like APIs and agent orchestration.

INSIGHT

Differentiation Via Distribution and Trust

Distribution, bundling, better UI, and access to unique data will differentiate AI offerings.
Trust issues limit immediate dominance by big tech despite their integrated stacks.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Nicolay here,

Today I have the chance to talk to Charles from Modal, who went from doing a PhD on neural network optimization in the 2010s - when ML engineers could build models with a soldering iron and some sticks - to architecting serverless infrastructure for AI models. Modal is about removing barriers so anyone can spin up a hundred GPUs in seconds.

The critical insight that stuck with me: "Don't build models, build systems that build models." Organizations often make the mistake of celebrating a one-time fine-tuned model that matches GPT-4 performance only to watch it become obsolete when the next foundation model arrives - typically three to six months down the road.

Charles's approach to infrastructure is particularly unconventional. He argues that serverless isn't just about convenience - it fundamentally changes how ambitious you can be with scale. "There's so much that gets in the way of trying to spin up a hundred GPUs or a thousand CPU containers that people just don't think to do something big."

The winning approach involves automated data pipelines with feedback collection, continuous evaluation against new foundation models, AB testing and canary deployments, and systematic error analysis and retraining.

In the podcast, we also cover:

Why inference, not training, is where the money is made
How to rethink compute when moving from traditional cloud to serverless
The economics of automated resource management
Why task decomposition is the key ML engineering skill
When to earn the right to fine-tune versus using foundation models

*📶 Connect with Charles:*

Twitter - https://twitter.com/charlesirl
Modal Labs - https://modal.com
Modal Slack Community - https://modal.com/slack

*📶 Connect with Nicolay:*

LinkedIn - https://linkedin.com/in/nicolay-gerold/
X / Twitter - https://x.com/nicolaygerold
Bluesky - https://bsky.app/profile/nicolaygerold.com
Website - https://nicolaygerold.com/
My Agency Aisbach - https://aisbach.com/ (for ai implementations / strategy)

*⏱️ Important Moments*

From CUDA to Serverless: [00:01:38] Charles's journey from PhD neural network optimization to building Modal's serverless infrastructure.
Rethinking Scale Ambition: [00:01:38] "There's so much that gets in the way of trying to spin up a hundred GPUs that people just don't think to do something big."
The Economics of Serverless: [00:04:09] How automated resource management changes the cattle vs pets paradigm for GPU workloads.
Lambda vs Modal Philosophy: [00:04:20] Why Modal was designed for tasks that take bytes and emit megabytes, unlike Lambda's middleware focus.
Inference Economics Reality: [00:10:16] "Almost nobody gets paid to make models - organizations get paid to make predictions."
The Open Source Commoditization: [00:14:55] How foundation models are becoming undifferentiated capabilities like databases.
Task Decomposition as Core Skill: [00:22:00] Why breaking down problems is equivalent to recognizing API boundaries in software engineering.
Systems That Build Models: [00:33:31] The critical difference between delivering static weights versus repeatable model production systems
Earning the Right to Fine-Tune: [00:34:06] The infrastructure prerequisites needed before attempting model customization.
Multi-Node Training Challenges: [00:52:24] How serverless platforms handle the contradiction of high-performance computing with spiky demand.

*🛠️ Tools & Tech Mentioned*

Modal - https://modal.com (serverless GPU infrastructure)
AWS Lambda - https://aws.amazon.com/lambda/ (traditional serverless)
Kubernetes - https://kubernetes.io/ (container orchestration)
Temporal - https://temporal.io/ (workflow orchestration)
Weights & Biases - https://wandb.ai/ (experiment tracking)
Hugging Face - https://huggingface.co/ (model repository)
PyTorch Distributed - https://pytorch.org/tutorials/intermediate/ddp_tutorial.html (multi-GPU training)
Redis - https://redis.io/ (caching and queues)

*📚 Recommended Resources*

Full Stack Deep Learning - https://fullstackdeeplearning.com/ (deployment best practices)
Modal Documentation - https://modal.com/docs (getting started guide)
Deep Seek Paper - https://arxiv.org/abs/2401.02954 (disaggregated inference patterns)
AI Engineer Summit - https://ai.engineer/ (community events)
MLOps Community - https://mlops.community/ (best practices)

💬 Join The Conversation

Follow How AI Is Built on YouTube - https://youtube.com/@howaiisbuilt, Bluesky - https://bsky.app/profile/howaiisbuilt.fm, or Spotify - https://open.spotify.com/show/3hhSTyHSgKPVC4sw3H0NUc?_authfailed=1%29

If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn - https://linkedin.com/in/nicolay-gerold/, X - https://x.com/nicolaygerold, or Bluesky - https://bsky.app/profile/nicolaygerold.com. Or at nicolay.gerold@gmail.com.

I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.