
Data Science Decoded Data Science #33 - The Backpropagation method, Paul Werbos (1980)
Nov 3, 2025
Delve into the groundbreaking work of Paul Werbos, exploring efficient derivative computations that revolutionize large-scale modeling. Discover the impact of backpropagation on neural networks and its relevance to modern data science. The hosts unpack sensitivity analysis, comparing forward and backward differentiation methods for optimized performance. They also introduce Generalized Dynamic Heuristic Programming, linking it to reinforcement learning and adaptive AI. With insights on second-order derivatives and future implications, it's a thought-provoking discussion on the evolution of data-driven modeling.
AI Snips
Chapters
Books
Transcript
Episode notes
Backward Differentiation Beats Perturbation
- Paul Werbos showed that backward differentiation (backpropagation) computes needed derivatives far more efficiently than naive forward perturbation for many-input problems.
- This efficiency makes sensitivity analysis and large-scale optimization feasible for models with narrow outputs like neural networks.
Chain Rule For Layers And Time
- Werbos generalized the chain rule for multi-layer, time-dependent systems, laying groundwork for backpropagation through layers or time steps.
- The method remains efficient even when implemented on parallel processors, foreshadowing modern GPU/parallel training.
Dense Nine-Page PhD Review
- Mike notes Werbos's PhD review is concise yet dense: nine pages summarizing a major thesis contribution.
- He contrasts that to modern long reviews and jokes LLMs can shorten them to a single sentence.

