Building Private GenAI stacks

48 snips

Jul 23, 2025

Luke Marsden, CEO and Founder of HelixML, delves into the world of Private GenAI and its necessity for enterprises seeking regulatory compliance. He discusses the integration of AI into CI/CD pipelines and breaks down the layers of a Private AI stack. Marsden highlights the advantages of Retrieval Augmented Generation (RAG) over fine-tuning LLMs and explores the shift from traditional DevOps to MLOps. Listen in for insights on managing large language models securely and the importance of personalized AI workflows in regulated industries.

Ask episode

AI Snips

Chapters

Transcript

Episode notes

ANECDOTE

Manufacturing Firm Adopts AI Early

A US manufacturing company fine-tuned Llama 2 to adapt to AI impacts despite being a traditional industry player.
This shows how forward-thinking boards recognize AI's transformative business potential early on.

INSIGHT

Private AI Stack Components

Running open source LLMs locally enhances control, privacy, and security, especially in regulated industries.
Private AI stacks combine infrastructure, GPU scheduling, control planes, models, knowledge, and API integrations.

ADVICE

Optimize GPU Use on Kubernetes

Use Kubernetes with GPU device plugins for managing private AI infrastructure effectively.
Consider advanced GPU schedulers for better memory packing and cost efficiency beyond native Kubernetes capabilities.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Luke Marsden (@lmarsden, CEO @HelixML) talks about Private GenAI. What is it? Why do you need it? We also discuss integration into CI/CD pipelines, the layers of a Private GenAI Stack, and why most organizations are opting for RAG over fine-tuning LLMs.

SHOW: 943

SHOW TRANSCRIPT: The Cloudcast #943 Transcript

SHOW VIDEO: https://youtube.com/@TheCloudcastNET

NEW TO CLOUD? CHECK OUT OUR OTHER PODCAST: "CLOUDCAST BASICS"

SPONSORS:

[DoIT] Visit doit.com (that’s d-o-i-t.com) to unlock intent-aware FinOps at scale with DoiT Cloud Intelligence.
[FCTR] Try FCTR.io (that's F-C-T-R dot io) free for 60 days. Modern security demands modern solutions. Check out Fctr's Tako AI, the first AI agent for Okta, on their website
[VASION] Vasion Print eliminates the need for print servers by enabling secure, cloud-based printing from any device, anywhere. Get a custom demo to see the difference for yourself.

SHOW NOTES:

Topic 1 - Welcome to the show Luke. Give everyone a brief intro.

Topic 2 - Let’s start with Priavte GenAI. What is it? Why should organizations out there consider it? Why not just use OpenAI GPT’s and fine tune them?

Topic 2a Follow up - Regulatory Compliance - take the opposing forces in the EU for instance to using SaaS based services based in the United States.

Topic 3 - Let’s break down the layers in a typical Private AI stack. I’m seen various ways to represent this such as infrastructure layer, MLOps layer, models, data layer (typically RAG), etc. How do you break up the stack into individual components

Topic 4 - My mind immediately jumps to similarities in the DevOps space. Abstraction layers and components like Docker and containers comes to mind, integration into CI/CD pipelines, etc. I feel like MLOps is it’s own thing with specific tools and workflows. Does this all come together and if so how?

Topic 5 - Also, what does this mean for versioning and lifecycle management of the models and the data?

Topic 6 - We are seeing more and more data pipelines with backed by multiple models, sometimes in multiple locations. How do handle this from both a scheduling and interface standpoint? Is everything hidden behind APIs for instance?

FEEDBACK?

Email: show at the cloudcast dot net
Bluesky: @cloudcastpod.bsky.social
Twitter/X: @cloudcastpod
Instagram: @cloudcastpod
TikTok: @cloudcastpod