Hidden Markov Models
A hidden Markov model has a latent state $z_t$, an observation $y_t$, a transition matrix $A = p(z_t\mid z_{t-1})$, and an emission matrix $C = p(y_t\mid z_t)$. As a Bayesian network the graph is just the static $z\to y$ block repeated across time: the same local structure unrolled.
1. Sample from the model
Start with the generative story. High self-transition probability makes state runs long; high sensor accuracy makes observations reliable. The same sampled observation sequence is used by the inference widgets below.
2. Forward-backward messages
Filtering uses observations up to $t$: $\alpha_t(i) \propto p(y_{1:t},z_t=i)$. Smoothing multiplies by the backward message: $\gamma_t(i) \propto \alpha_t(i)\beta_t(i) = p(z_t=i\mid y_{1:T})$. The diff view plots $\gamma_t - \alpha_t$; where it lights up, future evidence changed the belief about $z_t$. Click a column to move the time cursor without using the slider.
Why work in the log domain? The unnormalized joint $p(y_{1:t}, z_t=i)$ shrinks exponentially with $t$ because every observation multiplies in another factor below one. Figure 2b plots its magnitude two ways. The linear curve crashes through floating-point underflow long before the sequence ends; the log curve is a clean (negative) line.
3. Viterbi is not marginal smoothing
The smoother chooses the most probable state at each time independently. Viterbi chooses the most probable complete path. Those can disagree because locally-best marginals need not form the best joint sequence. Click an "obs" cell below to cycle that observation; on the sampler in Figure 1, clicking the "state" or "observed" cells works the same way.
What next
HMMs are dynamic Bayesian networks, and their inference algorithms are special cases of message passing.