Efficiency through Integration

Using an 8 billion parameter model costs only 10 cents per million tokens, demonstrating the affordability of large language models (LLMs). However, achieving a high usage of tokens for inference remains challenging. The hardware's efficiency stems from its integration onto a single chip, facilitating tighter connections between logic and memory, which improves data transfer speeds during inference. This innovative approach contrasts with traditional high bandwidth memory systems, contributing to significant cost-effectiveness in processing.

Transcript

chevron_right

Play full episode

chevron_right

Transcript

Episode notes

Our 181st episode with a summary and discussion of last week's big AI news!

With hosts Andrey Kurenkov and Jeremie Harris

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form.

Email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai

In this episode:

- Google's AI advancements with Gemini 1.5 models and AI-generated avatars, along with Samsung's lithography progress.
- Microsoft's Inflection usage caps for Pi, new AI inference services by Cerebrus Systems competing with Nvidia.
- Biases in AI, prompt leak attacks, and transparency in models and distributed training optimizations, including the 'distro' optimizer.
- AI regulation discussions including California’s SB1047, China's AI safety stance, and new export restrictions impacting Nvidia’s AI chips.

Timestamps + Links:

(00:00:00) Intro / Banter
(00:03:08)Response to listener comments / corrections
Tools & Apps
- (00:09:19) Google’s custom AI chatbots have arrived
- (00:12:52) Google releases three new experimental AI models
- (00:17:14) Google Gemini will let you create AI-generated people again
- (00:22:32) Five months after Microsoft hired its founders, Inflection adds usage caps to Pi
- (00:26:42:) Plaud takes a crack at a simpler AI pin
Applications & Business
- (00:30:31) Cerebras Systems throws down gauntlet to Nvidia with launch of ‘world’s fastest’ AI inference service
- (00:41:06) Nvidia announces $50 billion stock buyback
- (00:46:24) OpenAI in talks to raise funding that would value it at more than $100 billion
- (00:50:44) OpenAI Aims to Release New AI Model, ‘Strawberry,’ in Fall
- (00:52:53) 3 Co-Founders Leave French AI Startup H Amid ‘Operational Differences’
- (00:57:29) Samsung to Adopt High-NA Lithography Alongside Intel, Ahead of TSMC
- (01:02:11) Unitree's $16,000 G1 could become the first mainstream humanoid robot
Projects & Open Source
- (01:04:59) Meta leads open-source AI boom, Llama downloads surge 10x year-over-year
- (01:09:08) A_Preliminary_Report_on_DisTrO.
Research & Advancements
- (01:13:56) Diffusion Models Are Real-Time Game Engines
- (01:23:18) LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
- (01:32:21) Interviewing AI researchers on automation of AI R&D
- (01:40:33) Anthropic releases AI model system prompts, winning praise for transparency
Policy & Safety
Synthetic Media & Art
- (02:11:13) Actors Say AI Voice-Over Generator ElevenLabs Cloned Likenesses
(02:14:06) Outro

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!

Get the app

Home Top podcasts Popular guests Top books