From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra

18 snips

Sep 28, 2025

Brijesh Tripathi, CEO of Flex AI, combines his rich background in AI and HPC architecture to revolutionize AI infrastructure. He discusses the burdens of DevOps that slow down small AI teams and highlights Flex AI's innovative workload-as-a-service approach. Brijesh breaks down the challenges of accessing heterogeneous compute, the importance of consistent Kubernetes layers, and how to smooth costs for spiky workloads. He also shares insights on handling real-time vs. best-effort workloads, maximizing utilization, and ensuring that AI teams can focus on creativity instead of complexity.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Supercomputer Handover Sparked Flex AI

Brijesh Tripathi described building the Aurora supercomputer and the long handover to scientists.
That experience inspired Flex AI to simplify access to compute for researchers and developers.

INSIGHT

DevOps Friction Slows Small Teams

Small teams waste time on infrastructure instead of product because DevOps complexity is high.
Flex AI aims to remove that burden so teams can iterate on models faster.

ADVICE

Stabilize Kubernetes, Allow BYO Containers

Standardize the Kubernetes layer so developers don't handle cloud-specific library drift.
Offer bring-your-own-containers to capture edge cases while keeping a stable orchestration base.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Summary
In this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting how access friction and idle infrastructure slow progress. Join them as they discuss Flex AI's innovative approach to simplifying heterogeneous compute, standardizing on consistent Kubernetes layers, and abstracting inference across various accelerators, allowing teams to iterate faster without wrestling with drivers, libraries, or cloud-by-cloud differences. Brijesh also shares insights into Flex AI's strategies for lifting utilization, protecting real-time workloads, and spanning the full lifecycle from fine-tuning to autoscaled inference, all while keeping complexity at bay.

Pre-amble
I hope you enjoy this cross-over episode of the AI Engineering Podcast, another show that I run to act as your guide to the fast-moving world of building scalable and maintainable AI systems. As generative AI models have grown more powerful and are being applied to a broader range of use cases, the lines between data and AI engineering are becoming increasingly blurry. The responsibilities of data teams are being extended into the realm of context engineering, as well as designing and supporting new infrastructure elements that serve the needs of agentic applications. This episode is an example of the types of work that are not easily categorized into one or the other camp.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
Data teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.
Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
Your host is Tobias Macey and today I'm interviewing Brijesh Tripathi about FlexAI, a platform offering a service-oriented abstraction for AI workloads

Interview

Introduction
How did you get involved in machine learning?
Can you describe what FlexAI is and the story behind it?
What are some examples of the ways that infrastructure challenges contribute to friction in developing and operating AI applications?
- How do those challenges contribute to issues when scaling new applications/businesses that are founded on AI?
There are numerous managed services and deployable operational elements for operationalizing AI systems. What are some of the main pitfalls that teams need to be aware of when determining how much of that infrastructure to own themselves?
Orchestration is a key element of managing the data and model lifecycles of these applications. How does your approach of "workload as a service" help to mitigate some of the complexities in the overall maintenance of that workload?
Can you describe the design and architecture of the FlexAI platform?
- How has the implementation evolved from when you first started working on it?
For someone who is going to build on top of FlexAI, what are the primary interfaces and concepts that they need to be aware of?
Can you describe the workflow of going from problem to deployment for an AI workload using FlexAI?
One of the perennial challenges of making a well-integrated platform is that there are inevitably pre-existing workloads that don't map cleanly onto the assumptions of the vendor. What are the affordances and escape hatches that you have built in to allow partial/incremental adoption of your service?
What are the elements of AI workloads and applications that you are explicitly not trying to solve for?
What are the most interesting, innovative, or unexpected ways that you have seen FlexAI used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on FlexAI?
When is FlexAI the wrong choice?
What do you have planned for the future of FlexAI?

Contact Info

Parting Question

From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA