Jitendra Malik, a distinguished professor at UC Berkeley and a pioneer in computer vision, shares his insights on the complexities of replicating human visual perception. He discusses the challenges of Tesla's Autopilot, emphasizing the gap between human and computer processing. Malik explores how integrated approaches and knowledge schemas can enhance action recognition. He critiques current evaluation methods, advocating for measures that reflect true understanding. Additionally, he highlights the importance of interdisciplinary research and the need for children's experiences in AI development.
01:42:04
forum Ask episode
web_stories AI Snips
view_agenda Chapters
menu_book Books
auto_awesome Transcript
info_circle Episode notes
insights INSIGHT
Vision's Deceptive Simplicity
Computer vision is difficult because we do it subconsciously, making it seem simpler than it is.
Early AI researchers underestimated its complexity due to this effortless nature of human vision.
insights INSIGHT
Fallacy of the First Step
The "fallacy of the successful first step" describes how initial progress in vision can be misleadingly easy.
Achieving high accuracy becomes exponentially harder, requiring significantly more effort over time.
question_answer ANECDOTE
Tesla's Autopilot
Tesla's Autopilot, using a vision-based system with eight cameras and a neural network, tackles autonomous driving.
Jitendra Malik believes freeway driving is solvable, but full autonomy requires more than just vision.
Get the Snipd Podcast app to discover more snips from this episode
Written by Fyodor Dostoyevsky between 1867 and 1869, 'The Idiot' follows the story of Prince Lev Nikolayevich Myshkin, a young man with a pure and innocent heart, often mistaken for an 'idiot' due to his simplicity and goodness. The novel examines how this 'positively beautiful man' navigates a world filled with corruption, moral decay, and complex human relationships. Myshkin's interactions with characters like Nastasya Filippovna and Aglaia Epanchina highlight themes of love, suffering, sacrifice, and the clash between idealistic values and the harsh realities of society. The novel ultimately leads to Myshkin's mental breakdown and his inability to cope with the world around him[2][3][5].
Jitendra Malik is a professor at Berkeley and one of the seminal figures in the field of computer vision, the kind before the deep learning revolution, and the kind after. He has been cited over 180,000 times and has mentored many world-class researchers in computer science.
Here’s the outline of the episode. On some podcast players you should be able to click the timestamp to jump to that time.
OUTLINE:
00:00 – Introduction
03:17 – Computer vision is hard
10:05 – Tesla Autopilot
21:20 – Human brain vs computers
23:14 – The general problem of computer vision
29:09 – Images vs video in computer vision
37:47 – Benchmarks in computer vision
40:06 – Active learning
45:34 – From pixels to semantics
52:47 – Semantic segmentation
57:05 – The three R’s of computer vision
1:02:52 – End-to-end learning in computer vision
1:04:24 – 6 lessons we can learn from children
1:08:36 – Vision and language
1:12:30 – Turing test
1:16:17 – Open problems in computer vision
1:24:49 – AGI
1:35:47 – Pick the right problem