The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

67 snips

Aug 28, 2024

Arvind Narayanan, a Princeton professor and co-author of "AI Snake Oil," challenges the myth that simply adding more compute equates to better AI performance. He emphasizes that data quality, not just volume, is crucial for advancements in AI. The conversation dives into the future of AI models, debating whether we'll have a few large dominant models or many specialized ones. Narayanan critiques current generative AI pitfalls and stresses the importance of genuine user experiences over misleading benchmark scores. His insights offer a fresh perspective on AI's evolving landscape.

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

Compute Limits

Arvind Narayanan disagrees with the notion that more compute automatically leads to better AI models.
Data limitations and diminishing returns on compute are becoming significant bottlenecks.

INSIGHT

Data Bottleneck

YouTube's video data, while vast, is not significantly larger than current text data used to train large models.
This limits the potential for new emergent capabilities in text-based models.

INSIGHT

Synthetic Data Limits

Synthetic data helps improve existing data quality, not quantity.
Generating massive synthetic datasets is like a "snake eating its own tail," not adding new knowledge.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

Arvind Narayanan is a professor of Computer Science at Princeton and the director of the Center for Information Technology Policy. He is a co-author of the book AI Snake Oil and a big proponent of the AI scaling myths around the importance of just adding more compute. He is also the lead author of a textbook on the computer science of cryptocurrencies which has been used in over 150 courses around the world, and an accompanying Coursera course that has had over 700,000 learners.

In Today's Episode with Arvind Narayanan We Discuss:

1. Compute, Data, Algorithms: What is the Bottleneck:

Why does Arvind disagree with the commonly held notion that more compute will result in an equal and continuous level of model performance improvement?
Will we continue to see players move into the compute layer in the need to internalise the margin? What does that mean for Nvidia?
Why does Arvind not believe that data is the bottleneck? How does Arvind analyse the future of synthetic data? Where is it useful? Where is it not?

2. The Future of Models:

Does Arvind agree that this is the fastest commoditization of a technology he has seen?
How does Arvind analyse the future of the model landscape? Will we see a world of few very large models or a world of many unbundled and verticalised models?
Where does Arvind believe the most value will accrue in the model layer?
Is it possible for smaller companies or university research institutions to even play in the model space given the intense cash needed to fund model development?

3. Education, Healthcare and Misinformation: When AI Goes Wrong:

What are the single biggest dangers that AI poses to society today?
To what extent does Arvind believe misinformation through generative AI is going to be a massive problem in democracies and misinformation?
How does Arvind analyse AI impacting the future of education? What does he believe everyone gets wrong about AI and education?
Does Arvind agree that AI will be able to put a doctor in everyone's pocket? Where does he believe this theory is weak and falls down?