This week, I'm joined by Katharine Jarmul, Principal Data Scientist at Thoughtworks & author of the the forthcoming book, "Practical Data Privacy: Enhancing Privacy and Security in Data." Katharine began asking questions similar to those of today's ethical machine learning community as a university student working on her undergrad thesis during the war in Iraq. She focused that research on natural language processing and investigated the statistical differences between embedded & non-embedded reporters. In our conversation, we discuss ethical & secure machine learning approaches, threat modeling against adversarial attacks, the importance of distributed data setups, and what Katharine wants data scientists to know about privacy and ethical ML.
Katharine believes that we should never fall victim to a 'techno-solutionist' mindset where we believe that we can solve a deep societal problem simply with tech alone. However, by solving issues around privacy & consent with data collection, we can more easily address the challenges with ethical ML. In fact, ML research is finally beginning to broaden and include the intersections of law, privacy, and ethics. Katharine anticipates that data scientists will embrace PETs that facilitate data sharing in a privacy-preserving way; and, she evangelizes the un-normalization of sending ML data from one company to another.
Topics Covered:
- Katharine's motivation for writing a book on privacy for a data scientist audience and what she hopes readers will learn from it
- What areas must be addressed for ML to be considered ethical
- Overlapping AI/ML & Privacy goals
- Challenges with sharing data for analytics
- The need for data scientists to embrace PETs
- How PETs will likely mature across orgs over the next 2 years
- Katharine's & Debra's favorite PETs
- The importance of threat modeling ML models: discussing 'adversarial attacks' like 'model inversion' & 'membership inference' attacks
- Why companies that train LLMs must be accountable for the safety of their models
- New ethical approaches to data sharing
- Why scraping data off the Internet to train models is the harder, lazier, unethical way to train ML models
Resources Mentioned:
Guest Info:
Send us a text
Privado.aiPrivacy assurance at the speed of product development. Get instant visibility w/ privacy code scans.
Shifting Privacy Left MediaWhere privacy engineers gather, share, & learn
Buzzsprout - Launch your podcastDisclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.
Copyright © 2022 - 2024 Principled LLC. All rights reserved.