Detecting Harmful Content at Scale // Matar Haller // #246
Jul 9, 2024
auto_awesome
Matar Haller, VP of Data & AI at ActiveFence, discusses detecting harmful content online using AI, the challenges faced by platforms, leveraging Content Moderation APIs to flag harmful content, the importance of continuous model retraining, and transitioning hate speech models from notebooks to production APIs efficiently.
ActiveFence uses AI for online safety, focusing on detecting harmful content at scale.
Content moderation faces challenges from evolving harmful content types and the need for continuous monitoring.
Deep dives
Using AI to Combat Online Harm
Active Fence uses AI to combat hate speech and harmful content online. They focus on AI safety tech to detect and clean up unwanted content, ensuring a safer online environment. By flagging content for platforms and providing risk scores, they help in content moderation and preventing harmful material from spreading.
Importance of Trust and Safety Online
Trust and safety online is crucial as online harm can translate to offline harm. The podcast episode highlights the impact of misinformation online leading to real-world consequences like the events of January 6th. Active Fence emphasizes the need for effective content moderation due to the increased exposure to hate speech and harmful content.
Challenges in Content Moderation
Content moderation presents challenges in handling various media types like audio, video, images, and text in different languages and cultural contexts. Active Fence faces difficulties in detecting nuanced and evolving forms of harmful content. To address this, subject matter expertise, cultural awareness, and continuous training are essential.
Ensuring Model Accuracy and Safety
Active Fence conducts rigorous audits, functional tests, and false positive rate checks to ensure model accuracy in detecting harmful content. The company emphasizes continuous retraining and monitoring to prevent model drift, maintain high precision-recall balance, and improve the efficacy of content moderation.
Matar Haller is the VP of Data & AI at ActiveFence, where her teams own the end-to-end automated detection of harmful content at scale, regardless of the abuse area or media type. The work they do here is engaging, impactful, and tough, and Matar is grateful for the people she gets to do it with.
AI For Good - Detecting Harmful Content at Scale // MLOps Podcast #246 with Matar Haller, VP of Data & AI at ActiveFence.
// Abstract
One of the biggest challenges facing online platforms today is detecting harmful content and malicious behavior. Platform abuse poses brand and legal risks, harms the user experience, and often represents a blurred line between online and offline harm. So how can online platforms tackle abuse in a world where bad actors are continuously changing their tactics and developing new ways to avoid detection?
// Bio
Matar Haller leads the Data & AI Group at ActiveFence, where her teams are responsible for the data, algorithms, and infrastructure that fuel ActiveFence’s ability to ingest, detect, and analyze harmful activity and malicious content at scale in an ever-changing, complex online landscape. Matar holds a Ph.D. in Neuroscience from the University of California at Berkeley, where she recorded and analyzed signals from electrodes surgically implanted in human brains. Matar is passionate about expanding leadership opportunities for women in STEM fields and has three children who surprise and inspire her every day.
// MLOps Jobs board
https://mlops.pallet.xyz/jobs
// MLOps Swag/Merch
https://mlops-community.myshopify.com/
// Related Links
activefence.comhttps://www.youtube.com/@ActiveFence
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Catch all episodes, blogs, newsletters, and more: https://mlops.community/
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Matar on LinkedIn: https://www.linkedin.com/company/11682234/admin/feed/posts/
Timestamps:
[00:00] Matar's preferred coffee
[00:13] Takeaways
[01:39] The talk that stood out
[06:15] Online hate speech challenges
[08:13] Evaluate harmful media API
[09:58] Content moderation: AI models
[11:36] Optimizing speed and accuracy
[13:36] Cultural reference AI training
[15:55] Functional Tests
[20:05] Continuous adaptation of AI
[26:43] AI detection concerns
[29:12] Fine-Tuned vs Off-the-Shelf
[32:04] Monitoring Transformer Model Hallucinations
[34:08] Auditing process ensures accuracy
[38:38] Testing strategies for ML
[40:05] Modeling hate speech deployment
[42:19] Improving production code quality
[43:52] Finding balance in Moderation
[47:23] Model's expertise: Cultural Sensitivity
[50:26] Wrap up
Get the Snipd podcast app
Unlock the knowledge in podcasts with the podcast player of the future.
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode
Save any moment
Hear something you like? Tap your headphones to save it with AI-generated key takeaways
Share & Export
Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more
AI-powered podcast player
Listen to all your favourite podcasts with AI-powered features
Discover highlights
Listen to the best highlights from the podcasts you love and dive into the full episode