Evaluating AI Agents and Regulatory Landscape

This chapter introduces the Agent Board, a framework for evaluating large language model (LLM) agents through nuanced assessments based on sub-task performance. It examines the implications of the EU AI Act and ongoing studies related to AI's potential for creating biological threats, highlighting the need for regulatory measures and responsible AI use. The chapter concludes with discussions on recent leadership changes in AI policy amid concerns regarding the technology's impact on society.

Play episode from 01:09:07

chevron_right

Transcript

chevron_right

Transcript

Episode notes

Our 154th episode with a summary and discussion of last week's big AI news!

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

Email us your questions and feedback at contact@lastweekin.ai and/or hello@gladstone.ai

Correction: Andrey mentioned "State space machines", he meant "State space models"

Timestamps + links:

(00:00:00) Intro / Banter
Tools & Apps
- (00:02:06) Google Releases Gemini, an A.I.-Driven Chatbot and Voice Assistant
- (00:05:56) Copilot gets a big redesign and a new way to edit your AI-generated images
- (00:09:40) Arc Search's AI responses launched as an unfettered experience with no guardrails
- (00:12:40) Brilliant Labs’s Frame glasses serve as multimodal AI assistant
- (00:15:30) Stability AI launches SVD 1.1, a diffusion model for more consistent AI videos
- (00:16:18) OpenAI launches ChatGPT app for Apple Vision Pro
Applications & Business
Projects & Open Source
- (00:37:23) Allen Institute for AI launches open and transparent OLMo large language model
- (00:42:46) Meet ‘Smaug-72B’: The new king of open-source AI
- (00:47:02) Introducing Qwen1.5
- (00:50:54) Hugging Face launches open source AI assistant maker to rival OpenAI’s custom GPTs
- (00:53:20) Apple releases ‘MGIE’, a revolutionary AI model for instruction-based image editing
Research & Advancements
- (00:54:30) Learning Universal Predictors
- (01:01:00) Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
- (01:04:40) MusicRL: Aligning Music Generation to Human Preferences
- (01:05:47) FP6-LLM: Efficiently Serving Large Language Models Through FP6-CentricAlgorithm-System Co-Design
- (01:09:06) AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
- (01:12:36) Specialized Language Models with Cheap Inference from Limited Domain Data
Policy & Safety
- (01:13:22) EU’s AI Act passes last big hurdle on the way to adoption
- (01:17:04) Building an early warning system for LLM-aided biological threat creation
- (01:23:44) FCC votes to ban scam robocalls that use AI-generated voices
- (01:24:33) Biden administration names a director of the new AI Safety Institute
- (01:26:23) OpenAI's GPT-4 finally meets its match: Scots Gaelic smashes safety guardrails
Synthetic Media & Art
- (01:28:04) AI poisoning tool Nightshade received 250,000 downloads in 5 days: ‘beyond anything we imagined’
- (01:30:24) Labeling AI-Generated Images on Facebook, Instagram and Threads
- (01:33:06) OpenAI is adding new watermarks to DALL-E 3
- (01:34:38) Following lawsuit, rep admits “AI” George Carlin was human-written
(01:36:20) Outro

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books