Markov Chains

Transition matrices, stationary distributions, mixing, recurrence, periodicity, and continuous time.

A Markov chain is a stochastic process whose next state depends on the present state, not on the whole path that led there. In discrete time, a transition matrix $P$ contains one probability distribution per row:

$$ P_{ij} = \Pr(X_{t+1}=j\mid X_t=i), \qquad \sum_j P_{ij}=1. $$

The same object can be read as a simulator, a graph, a linear operator on distributions, and a long-run averaging machine. The figures below keep those views side by side.

Connection. The Chord Progressions page uses Markov chains audibly over diatonic functions. In MCMC, the question is inverted: design a chain whose stationary distribution is the distribution you want to sample.

1. States, transition matrix, and trajectories

Each row of $P$ is the outgoing distribution from one state. A sample path repeatedly draws from the current row. When the chain is irreducible and aperiodic, its empirical occupation histogram settles toward the stationary distribution $\pi=\pi P$.

Figure 1 · Editable transition graph and occupation mass
transition edges current state visit histogram

2. Matrix powers and stationary distribution

The entry $(P^n)_{ij}$ is the probability of being in state $j$ after $n$ steps, given that you started in state $i$. For a regular finite chain, the rows of $P^n$ converge to the same row: the stationary distribution $\pi$.

Figure 2 · Powers of P collapse toward 1πᵀ

3. Mixing, spectral gap, and detailed balance

Different starting distributions can approach stationarity at different speeds, but the slowest visible decay is controlled by the subdominant eigenvalue. The plot uses total variation distance $\|\mu P^n-\pi\|_{\mathrm{TV}}=\frac12\sum_i|\mu_iP^n-\pi_i|$.

Figure 3 · Three initial distributions racing toward stationarity
start at A start at B start uniform

Figure 3 shows the race; Figure 3b shows the speed limit. Total-variation distance to $\pi$ decays at worst geometrically, hugging the envelope $|\lambda_2|^n$ set by the subdominant eigenvalue. The spectral gap $1-|\lambda_2|$ is the mixing budget: a wider gap means the chain forgets its starting point sooner.

Figure 3b · Spectral gap controls the slow mixing envelope
TV distance from A $|\lambda_2|^n$ envelope spectral gap $1-|\lambda_2|$

Detailed balance and reversibility

Stationarity needs only global balance: at every state, total probability flowing in equals total flowing out. Detailed balance is the stronger, pairwise condition — for every pair of states the two directed flows match:

$$ \pi_i\,P_{ij} \;=\; \pi_j\,P_{ji}. $$

A chain whose stationary distribution satisfies this is reversible: the sequence of states looks the same run forwards or backwards. Detailed balance implies stationarity — sum both sides over $i$ and use the row-sum $\sum_i P_{ji}=1$ to get $\sum_i \pi_i P_{ij} = \pi_j$, which is exactly $\pi=\pi P$. The converse fails: a chain can hold $\pi$ fixed while probability steadily circulates.

Figure 3c draws each directed flow $\pi_i P_{ij}$ as a moving stream of probability. For a three-state chain at stationarity the net flow on every edge is forced to one shared value $J$, the circulation. Symmetrize flow drives $J$ to zero: opposing streams match, every edge is a wash, the chain is reversible. Add circulation builds a chain that is still stationary — inflow still equals outflow at every node — yet probability cycles $A\to B\to C\to A$ forever. Global balance holds in both cases; detailed balance is the extra structure the second chain lacks.

That extra structure is what makes MCMC work. Detailed balance is sufficient but not necessary for stationarity, and it is local: you can engineer a transition rule one edge at a time from $\pi$ alone, never solving $\pi=\pi P$ globally. Every Metropolis-style sampler on the Monte Carlo & Bayesian Sampling page is constructed exactly this way.

Figure 3c · Probability flow — detailed balance versus circulation
probability flow $\pi_iP_{ij}$ net circulation $J$, zero under detailed balance

4. Recurrence, transience, and absorption

In an absorbing chain, some states trap mass forever. Gambler's ruin is the clean example: a fortune walks up or down until it hits 0 or $N$, and both endpoints are absorbing.

Figure 4 · Gambler's ruin: mass flows into absorbing states
current distribution absorbing states

5. Periodicity

An irreducible chain can fail to converge if it is periodic. A two-state chain that always alternates has period 2: $P^n$ flips forever. Cesaro averages $\frac1N\sum_{n\lt N}P^n$ still converge to stationarity.

Figure 5 · Period 2 oscillation versus averaged convergence
ordinary Pⁿ Cesaro average

6. Continuous-time chains

A continuous-time Markov chain uses a rate matrix $Q$. Off-diagonal entries $Q_{ij}$ are jump rates, rows sum to zero, and the holding time in state $i$ is exponential with rate $-Q_{ii}=\sum_{j\neq i}Q_{ij}$. Conditional on leaving, the embedded discrete jump chooses $j$ with probability $Q_{ij}/(-Q_{ii})$.

Figure 6 · Rate matrix Q, holding times, and embedded chain

What to remember

Transition matrixEach row is the next-state distribution from one state.
Stationary distributionA distribution that is unchanged by one step: $\pi=\pi P$.
MixingFor irreducible aperiodic finite chains, starting conditions wash out.
Detailed balanceReversible chains match flow edge by edge, $\pi_iP_{ij}=\pi_jP_{ji}$; MCMC is engineered to satisfy it.
ClassificationAbsorbing, transient, recurrent, and periodic states explain failure modes.
Continuous timeRates set exponential holding times; embedded jumps form a discrete chain.