Gaussian Processes for Regression
A Gaussian process is the infinite-dimensional version of a multivariate normal. Pick any finite set of inputs $x_1,\ldots,x_n$ and the corresponding random function values are jointly Gaussian:
$$ f(x_{1:n}) \sim \mathcal{N}(m(x_{1:n}), K), \qquad K_{ij}=k(x_i,x_j). $$The mean function says where functions live before data. The kernel says which inputs move together. Regression is just conditioning that joint Gaussian on observed values.
1. Kernel zoo
The kernel encodes smoothness, length-scale, signal variance, periodicity, and stationarity. The heatmap in each mini-panel is the covariance matrix over fixed input locations; the curves are prior samples from that covariance.
2. Prior to posterior by conditioning
For noisy observations $y=f(X)+\epsilon$, $\epsilon\sim\mathcal{N}(0,\sigma_n^2I)$, the posterior at a test point $x_*$ has
$$ \mu_*(x_*) = k_*^T(K+\sigma_n^2I)^{-1}y,\qquad \sigma_*^2(x_*) = k(x_*,x_*) - k_*^T(K+\sigma_n^2I)^{-1}k_*. $$Click the plot to add an observation. Drag existing observations. The band shows roughly 95% posterior uncertainty.
3. Length-scale as model complexity
Short length-scales let nearby observations vary independently; long length-scales force the function to move as a broad sheet. The same data can look overfit, reasonable, or underfit depending on $\ell$.
4. Marginal likelihood landscape
The log marginal likelihood scores hyperparameters by integrating out the latent function:
$$ \log p(y\mid X,\theta)= -\frac12 y^T(K_\theta+\sigma_n^2I)^{-1}y -\frac12\log|K_\theta+\sigma_n^2I| -\frac n2\log(2\pi). $$Click the landscape to update the posterior below it. The optimum balances fit, uncertainty, and complexity.
5. Two-dimensional regression toy
The same conditioning formula works over any input space. In two dimensions, the posterior mean becomes a surface. The heatmap below draws that mean; opacity fades where posterior uncertainty is high. Drag training points to reshape the surface.
6. Acquisition teaser
Bayesian optimization uses the GP posterior to choose where to evaluate next. Expected improvement is high where the mean is promising, the uncertainty is large, or both.