Hidden Markov Models

Filtering, smoothing, and decoding as three different ways to read the same dynamic Bayesian network.

A hidden Markov model has a latent state $z_t$, an observation $y_t$, a transition matrix $A = p(z_t\mid z_{t-1})$, and an emission matrix $C = p(y_t\mid z_t)$. As a Bayesian network the graph is just the static $z\to y$ block repeated across time: the same local structure unrolled.

Figure 0 · HMM as an unrolled Bayesian network
latent state $z_t$ observation $y_t$

1. Sample from the model

Start with the generative story. High self-transition probability makes state runs long; high sensor accuracy makes observations reliable. The same sampled observation sequence is used by the inference widgets below.

Figure 1 · HMM sampler
hidden state observation mismatch

2. Forward-backward messages

Filtering uses observations up to $t$: $\alpha_t(i) \propto p(y_{1:t},z_t=i)$. Smoothing multiplies by the backward message: $\gamma_t(i) \propto \alpha_t(i)\beta_t(i) = p(z_t=i\mid y_{1:T})$. The diff view plots $\gamma_t - \alpha_t$; where it lights up, future evidence changed the belief about $z_t$. Click a column to move the time cursor without using the slider.

Figure 2 · Filter, smoother, and the diff between them
$\alpha$ filter / low value $\gamma$ smoother / high value positive correction negative correction

Why work in the log domain? The unnormalized joint $p(y_{1:t}, z_t=i)$ shrinks exponentially with $t$ because every observation multiplies in another factor below one. Figure 2b plots its magnitude two ways. The linear curve crashes through floating-point underflow long before the sequence ends; the log curve is a clean (negative) line.

Figure 2b · Why log-domain: unnormalized message magnitude over time
linear (underflows) log-domain double precision floor $\approx 10^{-308}$

3. Viterbi is not marginal smoothing

The smoother chooses the most probable state at each time independently. Viterbi chooses the most probable complete path. Those can disagree because locally-best marginals need not form the best joint sequence. Click an "obs" cell below to cycle that observation; on the sampler in Figure 1, clicking the "state" or "observed" cells works the same way.

Figure 3 · Marginal MAP path versus Viterbi path
marginal MAP Viterbi joint MAP disagreement

What next

HMMs are dynamic Bayesian networks, and their inference algorithms are special cases of message passing.