Hidden Markov Models

Filtering, smoothing, and decoding as three different ways to read the same dynamic Bayesian network.

A hidden Markov model has a latent state $z_t$, an observation $y_t$, a transition matrix $A = p(z_t\mid z_{t-1})$, and an emission matrix $C = p(y_t\mid z_t)$. As a Bayesian network the graph is just the static $z\to y$ block repeated across time: the same local structure unrolled.

Figure 0 · HMM as an unrolled Bayesian network

latent state $z_t$ observation $y_t$

1. Sample from the model

Start with the generative story. High self-transition probability makes state runs long; high sensor accuracy makes observations reliable. The same sampled observation sequence is used by the inference widgets below.

Figure 1 · HMM sampler

hidden state observation mismatch

state persistence 0.82

sensor accuracy 0.86

sequence length 24

2. Forward-backward messages

Filtering uses observations up to $t$: $\alpha_t(i) \propto p(y_{1:t},z_t=i)$. Smoothing multiplies by the backward message: $\gamma_t(i) \propto \alpha_t(i)\beta_t(i) = p(z_t=i\mid y_{1:T})$. The diff view plots $\gamma_t - \alpha_t$; where it lights up, future evidence changed the belief about $z_t$. Click a column to move the time cursor without using the slider.

Figure 2 · Filter, smoother, and the diff between them

$\alpha$ filter / low value $\gamma$ smoother / high value positive correction negative correction

current time k 10

Why work in the log domain? The unnormalized joint $p(y_{1:t}, z_t=i)$ shrinks exponentially with $t$ because every observation multiplies in another factor below one. Figure 2b plots its magnitude two ways. The linear curve crashes through floating-point underflow long before the sequence ends; the log curve is a clean (negative) line.

Figure 2b · Why log-domain: unnormalized message magnitude over time

linear (underflows) log-domain double precision floor $\approx 10^{-308}$

sequence length 240

3. Viterbi is not marginal smoothing

The smoother chooses the most probable state at each time independently. Viterbi chooses the most probable complete path. Those can disagree because locally-best marginals need not form the best joint sequence. Click an "obs" cell below to cycle that observation; on the sampler in Figure 1, clicking the "state" or "observed" cells works the same way.

Figure 3 · Marginal MAP path versus Viterbi path

marginal MAP Viterbi joint MAP disagreement

What next

HMMs are dynamic Bayesian networks, and their inference algorithms are special cases of message passing.

Graphs

Bayesian Graphical Models

See the static DAG ideas that HMMs repeat across time.

Computation

Monte Carlo & MCMC

When exact HMM-style messages fail, sampling moves probability mass approximately.

Processes

Markov Chains

Separate the latent transition dynamics from the observation model.