Exact Posterior vs Mean-Field VI

Add data points and watch a factorized variational posterior become overconfident.

Mean-field variational inference chooses a tractable family that factorizes across parameters. That is a computational bargain, not a claim about the posterior. In Bayesian linear regression the true posterior over intercept and slope is Gaussian, so we can compare it exactly against the best axis-aligned mean-field Gaussian.

The failure mode: when intercept and slope are correlated, reverse-KL mean-field VI cannot rotate its ellipse. It usually shrinks the marginal variances to avoid putting mass in low-posterior-density corners.

1. Add data, watch the posterior tilt

Click in the data panel to add points, drag a point to move it, or alt-click a point to remove it. The model is $y_i = \alpha + \beta x_i + \epsilon_i$, with Gaussian noise and a Gaussian prior on $(\alpha,\beta)$. The exact posterior is an ellipse in parameter space. The mean-field approximation is constrained to $q(\alpha,\beta)=q(\alpha)q(\beta)$, so its covariance ellipse must stay aligned with the axes.

Figure 1 · Bayesian linear regression posterior and mean-field approximation

exact posterior mean-field VI posterior predictive

noise $\sigma$ 0.35

prior sd 2.0

2. Coordinate updates as projections

For a Gaussian posterior, the reverse-KL mean-field optimum has the same mean as the exact posterior, but each factor variance is the inverse of the matching precision diagonal. Click the update button to alternate between the intercept factor and the slope factor. The red ellipse snaps toward the coordinate-wise optimum while the ELBO rises. You can also click anywhere in the $(\alpha,\beta)$ panel to drop $q$ at a different starting position before running CAVI.

Figure 2 · Mean-field coordinate updates and ELBO

exact posterior current $q(\alpha)q(\beta)$ ELBO trace

Reading the picture

When the $x$ values are concentrated on one side, many intercept-slope pairs explain the data nearly equally well. The exact posterior tilts along that tradeoff. The factorized approximation cannot represent the tilt, so reverse KL chooses a smaller axis-aligned ellipse. This is the common "VI underestimates uncertainty" picture, but here it emerges from data you place yourself.

What next

This page is the concrete mean-field example behind the broader variational-inference identity.

ELBO

Free Energy & Variational Inference

See the identity that makes these coordinate updates an evidence-bound optimization.

KL Divergence

Compare forward and reverse KL, including the mode-seeking behavior used here.

Variational methods

Calculus of Variations

Connect stationarity over distributions to stationarity over curves.