LLM Security and Privacy

16 snips

Mar 27, 2024

Sean Falconer, Head of Marketing & Developer Relations @ Skyflow, talks about LLM security and privacy, preventing PII leaks. They delve into the challenges, fears of customer PII exposure, and leaking company IP. Discussions include the importance of data masking, governance, and compliance in ML lifecycle management. They also touch on data tokenization, API security, and de-identifying data for protection.

Ask episode

Chapters

Transcript

Episode notes

Introduction

00:00 • 2min

Coding Challenge and Security of LLMs with Guest Sean Falconer

01:43 • 2min

Navigating the Fog of LLM Security and Privacy

03:43 • 6min

Ensuring Compliance in ML Lifecycle Management

09:40 • 8min

Data Tokenization and API Security

17:29 • 9min

Sean Falconer (@seanfalconer, Head of Dev Relations @SkyflowAPI, Host @software_daily) talks about security and privacy of LLMs and how to prevent PII (personally identifiable information) from leaking out

SHOW: 807

CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotw

NEW TO CLOUD? CHECK OUT OUR OTHER PODCAST - "CLOUDCAST BASICS"

SHOW SPONSORS:

Want to win a Tesla Cybertruck or $100,000? Enter the WS02 Choreo Code Challenge (before August 30th)
WSO2 Choreo - Why build a platform? Just add developers instead
CloudZero provides immediate and ongoing savings with 100% visibility into your total cloud spend

SHOW NOTES:

Topic 1 - Our topic for today is the security and privacy LLMs. What’s Sean’s origin story?

Topic 2 - Let’s dig into LLM security and privacy. We see this concern a lot on the podcast and we’ve touched on it with various past shows, but we haven’t dug in deep. First, let’s frame the problem. What are we talking about when we talk about LLM security and privacy?

Topic 3 - First, there is a fear that customer PII information might leak out. Second, company IP or confidential into might leak out related to products or offerings. We’ve seen examples of both to date. This could be exposed in the form of integration into a model (query it for the answer) or in the fine-tuning or RAG stage. Either one could lead to compliance issues, lost rev etc. But, that same data at risk is the potential differentiation of the models. How do you both mask the data but take advantage of the data?

Topic 4 - One thing I’ve noticed is many orgs only think about privacy in relation to the fine-tuning stage where they are taking a broad model and making it company specific. It is about much more than that though. Just like standard software development, we have different stages. How is the data collected and stored, how is it used for training and fine-tuning, how is it used after deployment and during interaction stage, etc. How should security and privacy be handled across all phases?

Topic 5 - Let’s talk beyond LLMs for a bit. What about Data Lakes and Data Warehousing? I see this as a problem across all big data, correct?

Topic 6 - How does API security fit into this? Much of what we are talking about is at the storage and retrieval level. But, increasingly we see API issues exposing data. How does that fit in here?

Topic 7 - Let’s talk podcasts, we had Jeff, the previous host of Software Engineering Daily on a few times. How are things over at Software Engineering Daily? Tell everyone a bit about the show.

FEEDBACK?

Email: show at the cloudcast dot net
Twitter: @cloudcastpod
Instagram: @cloudcastpod
TikTok: @cloudcastpod