Markov Chains
A Markov chain is a stochastic process whose next state depends on the present state, not on the whole path that led there. In discrete time, a transition matrix $P$ contains one probability distribution per row:
$$ P_{ij} = \Pr(X_{t+1}=j\mid X_t=i), \qquad \sum_j P_{ij}=1. $$The same object can be read as a simulator, a graph, a linear operator on distributions, and a long-run averaging machine. The figures below keep those views side by side.
1. States, transition matrix, and trajectories
Each row of $P$ is the outgoing distribution from one state. A sample path repeatedly draws from the current row. When the chain is irreducible and aperiodic, its empirical occupation histogram settles toward the stationary distribution $\pi=\pi P$.
2. Matrix powers and stationary distribution
The entry $(P^n)_{ij}$ is the probability of being in state $j$ after $n$ steps, given that you started in state $i$. For a regular finite chain, the rows of $P^n$ converge to the same row: the stationary distribution $\pi$.
3. Mixing, spectral gap, and detailed balance
Different starting distributions can approach stationarity at different speeds, but the slowest visible decay is controlled by the subdominant eigenvalue. The plot uses total variation distance $\|\mu P^n-\pi\|_{\mathrm{TV}}=\frac12\sum_i|\mu_iP^n-\pi_i|$.
Figure 3 shows the race; Figure 3b shows the speed limit. Total-variation distance to $\pi$ decays at worst geometrically, hugging the envelope $|\lambda_2|^n$ set by the subdominant eigenvalue. The spectral gap $1-|\lambda_2|$ is the mixing budget: a wider gap means the chain forgets its starting point sooner.
Detailed balance and reversibility
Stationarity needs only global balance: at every state, total probability flowing in equals total flowing out. Detailed balance is the stronger, pairwise condition — for every pair of states the two directed flows match:
$$ \pi_i\,P_{ij} \;=\; \pi_j\,P_{ji}. $$A chain whose stationary distribution satisfies this is reversible: the sequence of states looks the same run forwards or backwards. Detailed balance implies stationarity — sum both sides over $i$ and use the row-sum $\sum_i P_{ji}=1$ to get $\sum_i \pi_i P_{ij} = \pi_j$, which is exactly $\pi=\pi P$. The converse fails: a chain can hold $\pi$ fixed while probability steadily circulates.
Figure 3c draws each directed flow $\pi_i P_{ij}$ as a moving stream of probability. For a three-state chain at stationarity the net flow on every edge is forced to one shared value $J$, the circulation. Symmetrize flow drives $J$ to zero: opposing streams match, every edge is a wash, the chain is reversible. Add circulation builds a chain that is still stationary — inflow still equals outflow at every node — yet probability cycles $A\to B\to C\to A$ forever. Global balance holds in both cases; detailed balance is the extra structure the second chain lacks.
That extra structure is what makes MCMC work. Detailed balance is sufficient but not necessary for stationarity, and it is local: you can engineer a transition rule one edge at a time from $\pi$ alone, never solving $\pi=\pi P$ globally. Every Metropolis-style sampler on the Monte Carlo & Bayesian Sampling page is constructed exactly this way.
4. Recurrence, transience, and absorption
In an absorbing chain, some states trap mass forever. Gambler's ruin is the clean example: a fortune walks up or down until it hits 0 or $N$, and both endpoints are absorbing.
5. Periodicity
An irreducible chain can fail to converge if it is periodic. A two-state chain that always alternates has period 2: $P^n$ flips forever. Cesaro averages $\frac1N\sum_{n\lt N}P^n$ still converge to stationarity.
6. Continuous-time chains
A continuous-time Markov chain uses a rate matrix $Q$. Off-diagonal entries $Q_{ij}$ are jump rates, rows sum to zero, and the holding time in state $i$ is exponential with rate $-Q_{ii}=\sum_{j\neq i}Q_{ij}$. Conditional on leaving, the embedded discrete jump chooses $j$ with probability $Q_{ij}/(-Q_{ii})$.