Modes of Convergence

Almost sure, in probability, in distribution, in $L^p$ — the same sample paths, four different lenses.

A sequence of random variables $X_1, X_2, \ldots$ can approach a limit $X$ in several non-equivalent senses. Textbooks usually present the lattice of implications and a few hand-picked counterexamples in passing. Here the counterexamples are the centerpiece: each one is a recipe for sample paths $X_n(\omega)$, and the same paths are read four different ways.

Start by dragging the step $n$ — directly on the canvas, or with the slider — and watch the readouts. Then switch presets — each one is the canonical example of one mode of convergence failing while another succeeds.

1. Convergence of random variables

Each colored trace is one sample path $X_n(\omega_k)$ — same $\omega_k$ across all $n$, different $\omega_k$ per path. The horizontal axis is $n$; the shaded band is the $\varepsilon$-tube around the limit $X$. Four readouts measure the four modes:

Almost sure ($X_n \to^{\text{a.s.}} X$). Fraction of paths that have ever been outside the tube for some $m \geq n$ visible on the canvas — should shrink to $0$.
In probability ($X_n \to^p X$). Fraction of paths currently outside the tube — should shrink to $0$.
In $L^p$ ($X_n \to^{L^p} X$). Sample $\mathbb{E}|X_n - X|^p$ for $p \in \{1, 2\}$.
In distribution ($X_n \to^d X$). Lévy distance between the law of $X_n$ and the law of $X$ — unlike the Kolmogorov distance, it vanishes even when the limit $X$ is degenerate.

Figure 1 · Sample paths, four lenses

sample paths $X_n(\omega_k)$ $\varepsilon$-tube around $X$ currently outside tube future escapes ($m \geq n$) largest deviation ($L^p$ driver)

step $n$ 32

tube radius $\varepsilon$ 0.2

visible paths 12

2. The implication lattice

Five strict implications hold; nothing else does. Click any node to load the preset where that mode is the headline example; click any dashed (non-)arrow to load the canonical counterexample that breaks it. Each node is tinted by the loaded preset — green where that mode converges, red where it fails.

Figure 2 · Modes of convergence — what implies what

Hover or click an edge to see what it claims (or what counterexample disproves the converse). Node tint shows which modes hold for the loaded preset.

Reading the diagram. A solid arrow $A \Rightarrow B$ means every sequence that converges in mode $A$ also converges in mode $B$. A dashed arrow $A \not\Rightarrow B$ means a counterexample exists — clicking it loads that counterexample into Figure 1. The lattice has a sharp asymmetry: almost-sure and $L^p$ are both stronger than convergence in probability, but neither implies the other (the typewriter sits in one gap, the spike sits in the other).

3. Why the modes shrink: Markov, Chebyshev, Chernoff

The escape rate $\mathbb{P}(|X_n - X| \geq \varepsilon)$ is the central object behind convergence in probability. Three inequalities bound it from above, each using progressively more information:

Markov: $\mathbb{P}(|X_n - X| \geq \varepsilon) \leq \mathbb{E}|X_n - X| / \varepsilon$. Uses the first moment.
Chebyshev: $\mathbb{P}(|X_n - X| \geq \varepsilon) \leq \mathrm{Var}(X_n - X)/\varepsilon^2$. Uses the second moment.
Chernoff: $\mathbb{P}(S_n - n\mu \geq n t) \leq e^{-n \cdot I(t)}$ where $I$ is the rate function. Uses the whole MGF.

Figure 3 overlays the Markov and Chebyshev bounds on the empirical escape rate from Figure 1 — they should sit above the dots, sometimes tightly, sometimes wastefully. The Cauchy preset is the dramatic failure case: the second moment is infinite, so Chebyshev gives an empty bound, and the LLN itself fails.

Figure 3 · Empirical escape rate vs. Markov & Chebyshev bounds

empirical $\mathbb{P}(|X_n - X|\geq\varepsilon)$ Markov bound Chebyshev bound

$\varepsilon$ 0.2

paths 600

Markov bounds are usually loose because they only see the first moment. Chebyshev tightens them by squaring, but the next level — Chernoff — buys an exponential drop by exponentiating before taking expectation. Figure 4 shows the same tail $\mathbb{P}(\bar S_n - \mu \geq t)$ on a log scale, with all three bounds and the empirical rate. The Chernoff curve pulls away in a straight line; Chebyshev sags slowly like $1/n$.

Figure 4 · Concentration of $\bar S_n$ on a log scale

empirical tail Markov Chebyshev Chernoff

distribution Bernoulli(p)

parameter 0.5

deviation $t$ 0.15

max $n$ 200

paths 2000

A subtler use of Chernoff: pair the bound with the Borel–Cantelli lemma ($\sum_n \mathbb{P}(A_n) < \infty \Rightarrow \mathbb{P}(A_n \text{ i.o.}) = 0$). Exponential decay of $\mathbb{P}(|\bar S_n - \mu| \geq \varepsilon)$ is more than summable, so the strong law follows. Chebyshev's $1/n$ rate is not summable, which is why proving the strong LLN from Chebyshev alone requires the extra subsequence trick.

4. When can you swap limit and expectation?

Almost-sure convergence does not imply $L^1$ convergence — the spike preset shows $X_n \to 0$ a.s. while $\mathbb{E} X_n = 1$ for every $n$. Three theorems give sufficient conditions for $\mathbb{E} X_n \to \mathbb{E} X$:

Monotone convergence (MCT). If $0 \leq X_n \uparrow X$, then $\mathbb{E} X_n \uparrow \mathbb{E} X$. No domination needed.
Dominated convergence (DCT). If $X_n \to X$ a.s. and $|X_n| \leq g$ for some integrable $g$, then $\mathbb{E} X_n \to \mathbb{E} X$.
Fatou. Always: $\mathbb{E}[\liminf X_n] \leq \liminf \mathbb{E} X_n$. The inequality can be strict — the spike preset has $\liminf \mathbb{E} X_n = 1$ versus $\mathbb{E}[\liminf X_n] = 0$.

Figure 5 lets you propose a dominating function $g(\omega)$ for the spike. The canvas plots all the $X_n(\omega) = n\cdot\mathbf{1}_{[0,1/n]}(\omega)$ on the same axes; you adjust $g$ as a power-law envelope $g(\omega) = c \cdot \omega^{-\alpha}$ and the readout reports $\int_0^1 g$. The point is to see why no integrable $g$ dominates: $g$ must satisfy $g(\omega) \geq n$ on $[0, 1/n]$ for every $n$, which forces $g(\omega) \geq 1/\omega$ near $0$ — and $\int_0^1 1/\omega \,d\omega = \infty$.

Figure 5 · Can you dominate the spike?

$X_n(\omega) = n\cdot\mathbf{1}_{[0,1/n]}$ candidate envelope $g(\omega) = c\,\omega^{-\alpha}$

exponent $\alpha$ 0.6

scale $c$ 2

show up to $n =$ 12

The MCT story is gentler. Figure 6 picks an integrable $f$ on $[0,1]$ and lets $X_n = f \cdot \mathbf{1}_{[0, 1 - 1/n]}$ — a sequence that fills in toward $f$ from the left. Because $X_n \uparrow f$ pointwise, MCT gives $\mathbb{E} X_n \uparrow \mathbb{E} f$ with no other hypothesis. Flip the monotonicity off and the guarantee disappears.

Figure 6 · MCT: monotone fill-up

limit $f(\omega)$ current $X_n$

step $n$ 6

$f$ hump

enforce monotone fill (else: random window)

5. A guided tour through the lattice

The four counterexamples occupy specific positions in the lattice. Working them out by hand once is rewarding; clicking through them quickly is the next best thing.

Spike — a.s. and in probability, but not in $L^1$. The path $X_n(\omega) = n\cdot\mathbf{1}_{[0,1/n]}(\omega)$ has $\mathbb{E} X_n = 1$ for all $n$. (Breaks: a.s. $\Rightarrow L^1$, in prob $\Rightarrow L^1$.)
Typewriter — in probability and in $L^p$, but not almost surely. For each $\omega \in [0,1]$, infinitely many of the sliding indicators $\mathbf{1}_{[(n-2^k)/2^k, (n-2^k+1)/2^k]}$ light up. (Breaks: in prob $\Rightarrow$ a.s., $L^p \Rightarrow$ a.s.)
Sign flip — in distribution, but not in probability. With $X$ a fair $\pm 1$ and $X_n = (-1)^n X$, every $X_n$ has the same distribution as $X$, so $X_n \overset{d}{\to} X$, but $|X_n - X|$ takes only the values $0$ or $2$. (Breaks: in dist $\Rightarrow$ in prob.)
Cauchy mean — fails the LLN entirely. The running mean of iid Cauchy is itself Cauchy with the same scale, not contracting. Chebyshev cannot help: $\mathrm{Var} = \infty$.

What this page is not. It is not a proof gallery — the inequalities and implications are quoted, not derived. It is also not the CLT story; the CLT lives more comfortably with the named distributions and Berry–Esseen, which is itself a refinement of convergence in distribution rather than a separate mode.

What next

Distributions

Named Distributions

The limit objects: Gaussian (CLT), Cauchy (LLN-violator), and the relationships.

Foundations

Measure Theory

Where convergence almost-everywhere and $L^p$ live; sets up DCT and Fatou rigorously.

Statistics

Sufficient Statistics

Concentration of $T(X)$ around its mean — another consequence of the inequalities used here.