Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Explaining a Math Magic Trick, published by Robert AIZI on May 5, 2024 on LessWrong.
Introduction
A recent popular tweet did a "math magic trick", and I want to explain why it works and use that as an excuse to talk about cool math (functional analysis). The tweet in question:
This is a cute magic trick, and like any good trick they nonchalantly gloss over the most important step. Did you spot it? Did you notice your confusion?
Here's the key question: Why did they switch from a differential equation to an integral equation? If you can use (1x)1=1+x+x2+... when x=, why not use it when x=d/dx?
Well, lets try it, writing D for the derivative:
f'=f(1D)f=0f=(1+D+D2+...)0f=0+0+0+...f=0
So now you may be disappointed, but relieved: yes, this version fails, but at least it fails-safe, giving you the trivial solution, right?
But no, actually (1D)1=1+D+D2+... can fail catastrophically, which we can see if we try a nonhomogeneous equation like f'=f+ex (which you may recall has solution xex):
f'=f+ex(1D)f=exf=(1+D+D2+...)exf=ex+ex+ex+...f=?
However, the integral version still works. To formalize the original approach: we define the function I (for integral) to take in a function f(x) and produce the function If defined by If(x)=x0f(t)dt. This rigorizes the original trick, elegantly incorporates the initial conditions of the differential equation, and fully generalizes to solving nonhomogeneous versions like f'=f+ex (left as an exercise to the reader, of course).
So why does (1D)1=1+D+D2+... fail, but (1I)1=1+I+I2+... works robustly? The answer is functional analysis!
Functional Analysis
Savvy readers may already be screaming that the trick (1x)1=1+x+x2+... for numbers only holds true for |x|<1, and this is indeed the key to explaining what happens with D and I! But how can we define the "absolute value" of "the derivative function" or "the integral function"?
What we're looking for is a norm, a function that generalizes absolute values. A norm is a function x||x|| satisfying these properties:
1. ||x||0 for all x (positivity), and ||x||=0 if and only if x=0 (positive-definite)
2. ||x+y||||x||+||y|| for all x and y (triangle inequality)
3. ||cx||=|c|||x|| for all x and real numbers c, where |c| denotes the usual absolute value (absolute homogeneity)
Here's an important example of a norm: fix some compact subset of R, say X=[10,10], and for a continuous function f:XR define ||f||=maxxX|f(x)|, which would commonly be called the L-norm of f. (We may use a maximum here due to the Extreme Value Theorem. In general you would use a supremum instead.) Again I shall leave it to the reader to check that this is a norm.
This example takes us halfway to our goal: we can now talk about the "absolute value" of a continuous function that takes in a real number and spits out a real number, but D and I take in functions and spit out functions (what we usually call an operator, so what we need is an operator norm).
Put another way, the L-norm is "the largest output of the function", and this will serve as the inspiration for our operator norm. Doing the minimal changes possible, we might try to define ||I||=maxf continuous||If||. There are two problems with this:
1. First, since I is linear, you can make ||If|| arbitrarily large by scaling f by 10x, or 100x, etc. We can fix this by restricting the set of valid f for these purposes, just like how for the L example restricted the inputs of f to the compact set X=[10,10]. Unsurprisingly nice choice of set to restrict to is the "unit ball" of functions, the set of functions with ||f||1.
2. Second, we must bid tearful farewell to the innocent childhood of maxima, and enter the liberating adulthood of suprema. This is necessary since f ranges over the infinite-dimensional vector space of continuous functions, so the Heine-Borel theorem no longer guarant...

LW - Explaining a Math Magic Trick by Robert AIZI