Detecting and Steering Model Behavior

59sec Snip

00:00

Play full episode

Summary

Transcript

Episode notes

Recent approaches are challenging the idea that we have no insight into model functioning. Through visualizations and feature manipulation, such as setting drug-related features to zero, it is possible to detect and control model behavior at runtime, making it harder to manipulate models for undesired outcomes.

Our 170th episode with a summary and discussion of last week's big AI news!

With hosts Andrey Kurenkov (https://twitter.com/andrey_kurenkov) and Jeremie Harris (https://twitter.com/jeremiecharris)

Feel free to leave us feedback here.

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

Email us your questions and feedback at contact@lastweekin.ai and/or hello@gladstone.ai

Timestamps + Links:

Tools & Apps
Applications & Business
- (00:19:40) OpenAI is restarting its robotics research group
- (00:25:01) Saudi fund invests in China effort to create rival to OpenAI
- (00:29:34) UAE seeks ‘marriage’ with US over artificial intelligence deals
- (00:33:01) Zoox to test self-driving cars in Austin and Miami
- (00:35:49) Microsoft Lays Off 1,500 Workers, Blames "AI Wave"
- (00:38:28) Avengers, assemble—Google, Intel, Microsoft, AMD and more team up to develop an interconnect standard to rival Nvidia's NVLink
Projects & Open Source
- (00:40:39) GLM-4-9B-Chat-1M
- (00:46:37) Hugging Face and Pollen Robotics show off first project: an open source robot that does chores
- (00:49:40) Zyphra debuts Zyda, a 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv
- (00:51:59) Stability AI debuts new Stable Audio Open for sound design
Research & Advancements
Policy & Safety
- (01:20:11) Former OpenAI researcher foresees AGI reality in 2027
- (01:28:03) OpenAI Insiders Warn of a ‘Reckless’ Race for Dominance
- (01:33:52) Testing and mitigating elections-related risks
- (01:36:26) Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Synthetic Media & Art
- (01:43:23) The Uncanny Rise of the World's First AI Beauty Pageant
(01:46:25) Outro + AI Song

Get the Snipd
podcast app

Unlock the knowledge in podcasts with the podcast player of the future.

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

Save any
moment

Hear something you like? Tap your headphones to save it with AI-generated key takeaways

Share
& Export

Send highlights to Twitter, WhatsApp or export them to Notion, Readwise & more

AI-powered
podcast player

Listen to all your favourite podcasts with AI-powered features

Discover
highlights

Listen to the best highlights from the podcasts you love and dive into the full episode

#170 - new Sora rival, OpenAI robotics, understanding GPT4, AGI by 2027?

Last Week in AI

Detecting and Steering Model Behavior

59sec Snip

Get the Snipdpodcast app

AI-poweredpodcast player

Discoverhighlights

Save anymoment

Share& Export

AI-poweredpodcast player

Discoverhighlights

Get the Snipd
podcast app

AI-powered
podcast player

Discover
highlights

Save any
moment

Share
& Export

AI-powered
podcast player

Discover
highlights