Named Distributions

A family tree of the usual distributions: limits, sums, transformations, ratios, tails, and conjugacy.

Named distributions are easier to remember when they are connected by operations. Bernoulli trials add into binomials; rare binomials limit to Poisson; Gaussians stay Gaussian under sums; squared Gaussians make $\chi^2$; ratios make Cauchy, $t$, and $F$ laws. This page shows those connections.

1. Opening figure: the family map

Click a node to jump to its card. The edge labels are the operations that turn one distribution into another: sums, limits, transformations, ratios, and conjugate updates.

Figure 1 · Family map of named distributions

View:

discrete continuous italic = multivariate

support: bounded [a, b] half-bounded [0, ∞) unbounded ℝ

sum / transform / ratio conjugate prior special case limit

Hover for a one-line summary; click a node to scroll to its card.

2. Discrete distributions

discreteBernoulli → Binomial

One trial becomes a count.

A Bernoulli variable is a single yes/no event. A binomial is the sum of $n$ independent Bernoulli trials with the same success probability $p$.

Drag $n$ and $p$: the bars are $P(X=k)$ for $X\sim\mathrm{Binomial}(n,p)$. The center tracks $np$ and the spread tracks $np(1-p)$.

Law	PMF	CDF	CF	Mean	Var
Bernoulli	$p^x(1-p)^{1-x}$	$0,1-p,1$	$1-p+pe^{it}$	$p$	$p(1-p)$
Binomial	$\binom{n}{k}p^k(1-p)^{n-k}$	$\sum_{j\le k}\binom{n}{j}p^j(1-p)^{n-j}$	$(1-p+pe^{it})^n$	$np$	$np(1-p)$

n 12

p 0.35

discreteCategorical → Multinomial

From one $K$-sided draw to a vector of counts.

A categorical variable is a single draw over $K$ outcomes with probabilities $(p_1,\dots,p_K)$. The multinomial is the vector of category counts after $n$ independent categorical draws. With $K=2$ these reduce to Bernoulli and binomial; each marginal count is $\mathrm{Binomial}(n, p_k)$, but counts share negative covariance because the total must equal $n$.

Bars show the expected counts $np_k$ with $\pm 2$ standard-deviation whiskers from the binomial marginals.

Law	PMF	Mean	Var / Cov
Categorical	$\prod_k p_k^{\mathbb 1_{x=k}}$	$p_k$	$p_k(1-p_k)$; $-p_jp_k$
Multinomial	$\frac{n!}{\prod_k k_i!}\prod_k p_k^{k_i}$	$np_k$	$np_k(1-p_k)$; $-np_jp_k$

n 20

p₁ 0.5

p₂ 0.3

discreteGeometric

Waiting for the first success.

The geometric distribution counts trials until the first success. After failures, the remaining wait still has the same distribution: discrete memoryless waiting time.

The expectation follows from the tail sum:

\[\mathbb{E}X=\sum_{k\ge0}P(X>k)=\sum_{k\ge0}(1-p)^k=1/p.\]

PMF	$p(1-p)^{k-1}$ for $k\ge1$
CDF	$1-(1-p)^k$
CF	$\frac{pe^{it}}{1-(1-p)e^{it}}$
Mean / variance	$1/p$, $(1-p)/p^2$
Fact	Discrete memoryless waiting time.

p 0.25

discretePoisson as a rare-event limit

Many tiny chances become a rate.

If $X_n\sim\mathrm{Binomial}(n,\lambda/n)$, then as $n\to\infty$ the mass approaches $\mathrm{Poisson}(\lambda)$. This is the count model for many independent rare opportunities.

Poisson processes are the process-level version: counts over time windows plus exponential waiting times.

The plot overlays the binomial bars (blue, wider) with the limiting Poisson distribution (red, narrower).

PMF	$e^{-\lambda}\lambda^k/k!$
CDF	$\sum_{j\le k}e^{-\lambda}\lambda^j/j!$
CF	$\exp(\lambda(e^{it}-1))$
Mean / variance	$\lambda$, $\lambda$
Fact	Independent Poisson counts add by adding rates.

lambda 4

n 40

discreteNegative Binomial

Waiting for several successes, or overdispersed counts.

A negative binomial can be seen as a sum of geometric waits. As a count model, it is what you reach for when Poisson variance is too small for the data.

The gray curve is a Poisson with the same mean. The wider bars show the extra dispersion.

PMF	$\binom{k+r-1}{k}p^r(1-p)^k$
CDF	$\sum_{j\le k}\binom{j+r-1}{j}p^r(1-p)^j$
CF	$\left(\frac{p}{1-(1-p)e^{it}}\right)^r$
Mean / variance	$r(1-p)/p$, $r(1-p)/p^2$
Fact	Variance-to-mean ratio is $1/p$, so it exceeds Poisson when $p<1$.

r successes 5

p 0.35

3. Continuous distributions

continuousUniform and inverse-CDF sampling

The random-number source behind the others.

If $U\sim\mathrm{Uniform}(0,1)$, then $F^{-1}(U)$ has CDF $F$. This is the inverse-CDF method behind exact one-dimensional sampling.

The left axis is uniform probability. The curve is $F^{-1}(u)$ for the chosen family, turning evenly spaced $u$ values into nonuniform samples.

Monte Carlo & MCMC builds this into rejection and importance sampling.

PDF	$1/(b-a)$ on $[a,b]$
CDF	$(x-a)/(b-a)$ on $[a,b]$
CF	$\frac{e^{itb}-e^{ita}}{it(b-a)}$
Mean / variance	$(a+b)/2$, $(b-a)^2/12$
Fact	The source law behind inverse-CDF sampling.

target Exponential

shape/rate 1.4

continuousExponential

Memoryless waiting in continuous time.

The exponential distribution is the continuous waiting time with no aging:

\[P(T>s+t\mid T>s)=P(T>t).\]

It is also the interarrival-time distribution in a Poisson process.

The minimum of independent exponentials is exponential again: $\min(T_1,T_2)\sim\mathrm{Exp}(\lambda_1+\lambda_2)$. Competing clocks add rates.

PDF	$\lambda e^{-\lambda x}$ for $x\ge0$
CDF	$1-e^{-\lambda x}$
CF	$\lambda/(\lambda-it)$
Mean / variance	$1/\lambda$, $1/\lambda^2$
Fact	The only continuous memoryless distribution.

lambda 1 1.1

lambda 2 0.8

continuousGaussian

The shape sums with finite variance converge to.

Gaussian distributions are closed under sums: independent normals add to a normal. More broadly, normalized sums of many finite-variance variables drift toward a bell curve.

The bars show the sum of $m$ centered uniform variables rescaled to variance $\sigma^2$. The curve is the matching normal approximation.

PDF	$\frac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/(2\sigma^2)}$
CDF	$\Phi((x-\mu)/\sigma)$
CF	$e^{i\mu t-\sigma^2t^2/2}$
Mean / variance	$\mu$, $\sigma^2$
Fact	Closed under independent sums.

sigma 1

summands 4

Figure 3b · Galton board: binomial paths become a bell curve

falling balls normal approximation empirical histogram

rows 12

balls 600

continuousCauchy

A ratio distribution with no mean.

If $Z_1,Z_2$ are independent standard normals, then $Z_1/Z_2$ is Cauchy. Its tails are so heavy that the mean does not exist.

The plot shows the theoretical Cauchy density against a normal density. The Cauchy peak is lower and its tails decay much more slowly.

PDF	$\frac{1}{\pi\gamma[1+((x-x_0)/\gamma)^2]}$
CDF	$\frac{1}{\pi}\arctan\frac{x-x_0}{\gamma}+\frac{1}{2}$
CF	$e^{ix_0t-\gamma\|t\|}$
Mean / variance	Undefined; undefined.
Fact	$Z_1/Z_2$ for independent standard normals.

scale gamma 1

continuousLaplace

The double exponential: a sharp peak with exponential tails.

If $E_1,E_2$ are independent $\mathrm{Exp}(1/b)$, then $E_1-E_2\sim\mathrm{Laplace}(0,b)$. The density is the symmetric exponential $\frac{1}{2b}e^{-|x-\mu|/b}$, so it has a cusp at the mean and decays linearly on a log scale — heavier tails than a Gaussian, but far lighter than a Cauchy.

It is the maximum-entropy distribution on $\mathbb{R}$ for a fixed mean absolute deviation $\mathbb E|X-\mu|=b$, the same way Gaussian is max-entropy for fixed variance. As a noise model, $-\log p(x\mid\mu,b)\propto|x-\mu|/b$, so MLE under Laplace noise is median regression (L1 loss); as a prior on regression coefficients it gives the lasso.

PDF	$\frac{1}{2b}e^{-\|x-\mu\|/b}$
CDF	$\tfrac12+\tfrac12\,\mathrm{sgn}(x-\mu)(1-e^{-\|x-\mu\|/b})$
CF	$e^{i\mu t}/(1+b^2t^2)$
Mean / variance	$\mu$, $2b^2$
Fact	Difference of two iid $\mathrm{Exp}(1/b)$; max-entropy on $\mathbb{R}$ with fixed $\mathbb E\|X-\mu\|$.

scale b 1

4. Operations that make new distributions

4a. The central limit theorem: any shape becomes Gaussian

Pick any base distribution with finite variance. Standardize its sample mean:

$$ Z_n \;=\; \frac{\bar X_n - \mu}{\sigma/\sqrt n} \;=\; \frac{\sqrt n\,(\bar X_n - \mu)}{\sigma}. $$

Then $Z_n \Rightarrow \mathcal N(0,1)$ as $n \to \infty$. The base shape can be skewed, bimodal, or discrete — the standardized sum is still Gaussian in the limit. Pick a base below and slide $n$.

Figure 4a · The standardized sample mean converges to N(0, 1)

base distribution sample standardized sum (Monte Carlo) $\mathcal N(0, 1)$ target

base:

summands $n$: 2

Things to notice:

At $n = 1$, the bottom panel is the top panel — standardized but otherwise unchanged. The Exponential is right-skewed; the bimodal mixture is two-humped; Bernoulli is two spikes.
By $n = 5{-}10$, all four bases produce a bell-shaped bottom panel that visibly tracks the $\mathcal N(0,1)$ curve. The empirical mean and variance of $Z_n$ in the readout are already $\approx 0$ and $\approx 1$.
The Bernoulli base needs the most summands — for small $n$ the standardized sum is still discrete, so the histogram is jagged. The CLT converges in distribution, not pointwise: the discrete histogram smooths into the continuous bell as $n$ grows.
This is why the Gaussian shows up everywhere: any finite-variance averaging process produces it.

4b. Other operations

operationSums and convolution

Adding independent variables convolves their densities.

The density of $X+Y$ is $f_X*f_Y$. Some families are closed under addition: Gaussian plus Gaussian is Gaussian; Poisson plus Poisson is Poisson; Cauchy plus Cauchy is Cauchy.

The plot shows two draggable family choices and their sum.

X Normal

Y Normal

operationMax and min of i.i.d. samples

Extremes act on the CDF, not the density.

If $M_n=\max(X_1,\dots,X_n)$, then:

\[P(M_n\le x)=F(x)^n.\]

If $m_n=\min(X_1,\dots,X_n)$, then:

\[P(m_n\le x)=1-(1-F(x))^n.\]

The plot uses a $\mathrm{Uniform}(0,1)$ base distribution and shows how increasing $n$ pushes mass toward the right edge for maxima and the left edge for minima.

n 8

show max

operationTransformations and Jacobians

Changing variables reshapes density by the derivative.

For a monotone transform:

\[f_Y(y)=f_X(g^{-1}(y))\left|\frac{d}{dy}g^{-1}(y)\right|.\]

Non-monotone transforms sum over preimages.

Two canonical cases: $Y=X^2$ turns $N(0,1)$ into $\chi^2_1$; $Y=e^X$ turns a normal into a log-normal.

transform Y = X^2

sigma 1

operationRatios

Division creates heavy tails.

The Cauchy distribution is what you get when you divide one standard normal by another. Near-zero denominators create enormous ratios.

This is the ratio story behind the $t$ and $F$ sampling distributions too: normalize by an estimated scale, and tail weight appears.

denominator scale 1

5. Sampling distributions from Gaussians

continuous$\chi^2$

Sum squared standard normals.

If $Z_i\sim N(0,1)$ independently, then $\sum_{i=1}^k Z_i^2\sim\chi^2_k$. It is the distribution of squared Gaussian length in $k$ dimensions.

PDF	$x^{k/2-1}e^{-x/2}/(2^{k/2}\Gamma(k/2))$
CDF	$P(k/2,x/2)$
CF	$(1-2it)^{-k/2}$
Mean / variance	$k$, $2k$
Fact	Gamma with shape $k/2$ and rate $1/2$.

k 4

continuousStudent's t

A normal divided by estimated scale.

If $Z\sim N(0,1)$ and $V\sim\chi^2_\nu$, then $T=Z/\sqrt{V/\nu}$ has Student's $t_\nu$ distribution. As $\nu$ grows, the denominator stabilizes and $t$ becomes Gaussian.

PDF	$\frac{\Gamma((\nu+1)/2)}{\sqrt{\nu\pi}\Gamma(\nu/2)}(1+x^2/\nu)^{-(\nu+1)/2}$
CDF	Standard $t_\nu$ special-function CDF.
CF	Special-function form; no compact elementary expression.
Mean / variance	$0$ for $\nu>1$; $\nu/(\nu-2)$ for $\nu>2$
Fact	Heavy-tailed because the scale is estimated.

degrees freedom 6

continuousF distribution

Ratio of scaled chi-squares.

If $U\sim\chi^2_{d_1}$ and $V\sim\chi^2_{d_2}$ independently, then $(U/d_1)/(V/d_2)$ has an $F_{d_1,d_2}$ distribution. It appears in variance comparisons and ANOVA-style ratios.

PDF	$\frac{(d_1/d_2)^{d_1/2}x^{d_1/2-1}}{B(d_1/2,d_2/2)(1+d_1x/d_2)^{(d_1+d_2)/2}}$
CDF	$I_{d_1x/(d_1x+d_2)}(d_1/2,d_2/2)$
CF	Special-function form; no compact elementary expression.
Mean / variance	$d_2/(d_2-2)$; finite variance for $d_2>4$
Fact	Ratio of independent variance estimates.

d1 5

d2 12

continuousInverse $\chi^2$ and scaled inverse $\chi^2$

One over a chi-square; the conjugate prior for normal variance.

If $X\sim\chi^2_\nu$ then $Y=1/X$ has the inverse chi-squared distribution with $\nu$ degrees of freedom. The scaled inverse chi-squared, $\mathrm{Scale}$-$\mathrm{Inv}\chi^2(\nu,\tau^2)$, multiplies by $\nu\tau^2$ and is the standard conjugate prior for the variance $\sigma^2$ of a normal with known mean: $n$ observations with sample variance $s^2$ update the posterior to $\mathrm{Scale}$-$\mathrm{Inv}\chi^2(\nu+n,\,(\nu\tau^2+n s^2)/(\nu+n))$.

For both unknown mean and unknown variance, the conjugate prior is Normal–scaled-Inv-$\chi^2$ (equivalently Normal–Gamma on the precision, via the $1/X$ edge). Marginalising out the variance turns the Gaussian posterior predictive into Student's $t$ with degrees of freedom equal to the posterior $\nu_n$ — heavy tails when $n$ is small, Gaussian as $n\to\infty$. This is the single derivation that ties Chi², Inv-$\chi^2$, and Student-$t$ together in Bayesian inference.

PDF	$\frac{2^{-\nu/2}}{\Gamma(\nu/2)}x^{-\nu/2-1}e^{-1/(2x)}$ for $x>0$
CDF	$Q(\nu/2, 1/(2x))$ (upper regularized $\gamma$)
Mean / variance	$1/(\nu-2)$ for $\nu>2$; $2/((\nu-2)^2(\nu-4))$ for $\nu>4$
Fact	Conjugate prior for the variance of a normal with known mean.

degrees freedom 6

6. Conjugate prior–likelihood pairs

Conjugacy means the posterior stays in the same family as the prior. These pairs are exact-update shortcuts, and reference points for variational inference when exact updating is not available.

Why conjugacy exists. All four pairs below share a single structural property: the likelihood is an exponential family $p(x\mid\theta) = h(x)\exp\bigl(\eta(\theta)\cdot T(x) - A(\theta)\bigr)$, and the prior has the same exponential-family shape in $\theta$. Multiplying prior by likelihood and absorbing the result back into the same form gives a posterior whose natural parameter is just a shift in the sufficient-statistic direction. Concretely: seeing data shifts $\alpha$ by $\sum T(x_i)$. These are exactly the families with finite-dimensional sufficient statistics. The data influences the posterior only through the fixed-size summary $\sum T(x_i)$, which is why the update is a simple parameter shift. Conjugacy is the cases where the posterior's tilt stays inside a finite-dimensional family. When that's not true, you reach for variational inference.

Family	Sufficient statistic $T(x)$	Natural parameter shift on update
Bernoulli / Binomial	$x$ (count of successes)	$\alpha\to\alpha+s,\;\beta\to\beta+f$
Poisson	$x$ (count)	$\alpha\to\alpha+\sum y_i,\;\beta\to\beta+t$
Normal (known $\sigma^2$)	$x$	precision adds; mean is precision-weighted
Categorical / Multinomial	$(\mathbb 1_{x=k})_k$	$\alpha_k\to\alpha_k+n_k$

Named distributions as maximum-entropy answers. Most of the distributions on this page are not arbitrary mathematical objects — they are the unique distributions that maximize entropy given a particular support and moment constraint:

Support	Constraints (beyond normalization)	Max-entropy law
$[a, b]$	none	Uniform$(a,b)$
$[0, \infty)$	fixed mean $1/\lambda$	Exponential$(\lambda)$
$\mathbb{R}$	fixed mean $\mu$, variance $\sigma^2$	Normal$(\mu, \sigma^2)$
$\mathbb{R}$	fixed mean $\mu$, mean abs deviation $\mathbb E\|X-\mu\|=b$	Laplace$(\mu, b)$
$\{0,1,\dots,N\}$	fixed mean	Discrete exp-family (Binomial-like)
$\{0,1,2,\dots\}$	fixed mean $\mu$	Geometric, $q = \mu/(\mu+1)$
$\{0,1,2,\dots\}$	fixed mean & variance	Negative-binomial-family
$\mathbb{R}^d$	fixed mean & covariance	Multivariate Normal
$(K-1)$-simplex	fixed $\mathbb E[\log x_k]$ for each $k$	Dirichlet$(\alpha_1,\dots,\alpha_K)$

The pattern is the same in every row: write down the Lagrangian $-\int q\log q - \sum_i \lambda_i(\int T_i q - c_i)$, take the variation, and the stationary $q^*\propto\exp(\sum_i\lambda_i T_i)$ is exactly an exponential family with $T_i$ as sufficient statistics. See the max-entropy interactive on the Fisher-information page to step through the constraints and watch the family member emerge, and the Legendre-duality section for why this construction is forced by the geometry of $\log\sum e^{\eta T}$.

bayesBeta-Binomial

Prior over a Bernoulli/binomial probability. Successes add to $\alpha$; failures add to $\beta$.

Beta PDF	$x^{\alpha-1}(1-x)^{\beta-1}/B(\alpha,\beta)$
CDF	$I_x(\alpha,\beta)$
CF	Special-function form.
Mean / variance	$\alpha/(\alpha+\beta)$; $\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$
Update	$\alpha'=\alpha+s$, $\beta'=\beta+f$

successes 8

failures 5

bayesGamma-Poisson

Prior over a Poisson rate. Counts add to shape; exposure adds to rate.

Gamma PDF	$\beta^\alpha x^{\alpha-1}e^{-\beta x}/\Gamma(\alpha)$
CDF	$P(\alpha,\beta x)$
CF	$(1-it/\beta)^{-\alpha}$
Mean / variance	$\alpha/\beta$, $\alpha/\beta^2$
Update	$\alpha'=\alpha+\sum y_i$, $\beta'=\beta+t$

count sum 12

exposure 6

bayesNormal-Normal

Known variance Gaussian observations update a Gaussian prior by precision-weighted averaging.

Normal PDF	$\frac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/(2\sigma^2)}$
CDF / CF	$\Phi((x-\mu)/\sigma)$; $e^{i\mu t-\sigma^2t^2/2}$
Mean / variance	$\mu$, $\sigma^2$
Update	Posterior precision is prior precision plus data precision.

sample mean 1

n 10

bayesScaled Inv-$\chi^2$ — Normal variance

Prior over the variance $\sigma^2$ of a normal with known mean. Sample variance shifts both the degrees of freedom and the scale.

Prior	$\sigma^2\sim\mathrm{Scale}\text{-}\mathrm{Inv}\chi^2(\nu_0,\tau_0^2)$
Mean / variance	$\nu_0\tau_0^2/(\nu_0-2)$; finite var for $\nu_0>4$
Update	$\nu'=\nu_0+n,\;\tau'^2=(\nu_0\tau_0^2+ns^2)/\nu'$
Limit	Reference (Jeffreys) prior $p(\sigma^2)\propto 1/\sigma^2$ as $\nu_0\to 0$.

n 10

s² 1

bayesDirichlet-Multinomial

Prior over category probabilities. Each observed category increments its corresponding concentration.

Dirichlet PDF	$\frac{1}{B(\alpha)}\prod_i x_i^{\alpha_i-1}$
CDF / CF	No compact elementary form.
Mean	$\mathbb E[X_i]=\alpha_i/\alpha_0$
Variance	$\alpha_i(\alpha_0-\alpha_i)/(\alpha_0^2(\alpha_0+1))$
Update	$\alpha_i'=\alpha_i+n_i$

A count 10

B count 6

C count 3

7. Heavy tails

tailPower law / Pareto

Slow tail decay changes which moments exist.

The Pareto tail has $P(X>x)\propto x^{-\alpha}$. On log-log axes it becomes a straight line, while exponential and Gaussian tails curve downward much faster. Smaller $\alpha$ means heavier tails and fewer finite moments.

Cauchy has no mean or variance. Pareto has finite moments only for orders below $\alpha$: mean needs $\alpha>1$, variance needs $\alpha>2$.

PDF	$\alpha x_m^\alpha/x^{\alpha+1}$ for $x\ge x_m$
CDF	$1-(x_m/x)^\alpha$
CF	Special-function form; no compact elementary expression.
Mean	$\alpha x_m/(\alpha-1)$ for $\alpha>1$
Variance	$\alpha x_m^2/((\alpha-1)^2(\alpha-2))$ for $\alpha>2$

alpha 1.6

axis log-log

8. Decision table

Start from what you are modeling; pick the distribution whose construction matches that story.

What you are modeling	Reach for	Why
Single yes/no event	Bernoulli	One trial with success probability $p$.
Number of successes in fixed trials	Binomial	Sum of independent Bernoulli trials.
Single draw over $K$ outcomes	Categorical	One $K$-sided die roll; generalizes Bernoulli.
Vector of category counts in fixed trials	Multinomial	Sum of independent categorical draws; generalizes binomial.
Rare event counts in fixed exposure	Poisson	Limit of many tiny independent chances; one rate parameter.
Counts with variance larger than the mean	Negative Binomial	Poisson-like count with extra dispersion.
Waiting time until an event	Geometric / Exponential	Discrete or continuous memoryless waiting.
Measurement noise from many small effects	Gaussian	Stable under sums and central-limit behavior.
Ratio of two noisy measurements	Cauchy / Student's t	Near-zero denominators create heavy tails.
Symmetric noise with a sharp peak and exponential tails	Laplace	Max-entropy on $\mathbb{R}$ with fixed mean absolute deviation; MLE gives L1 / median regression, prior gives lasso.
Squared Gaussian length or variance component	Chi-square	Sum of squared standard normals.
Prior over a normal variance	Scaled inverse chi-square	Conjugate prior for $\sigma^2$ with known mean.
Ratio of variance estimates	F	Ratio of scaled chi-square variables.
Bounded proportion or probability	Beta (see Beta-Binomial)	Flexible distribution on $[0,1]$; conjugate to binomial data.
Probability vector over $K$ categories	Dirichlet	Multivariate Beta on the simplex; conjugate to categorical / multinomial data.
Positive rate or scale	Gamma	Positive support; conjugate to Poisson rates.
Extreme values or tail risk	Pareto / extreme-value family	Tail behavior is the main object, not the center.

What next

Sampling

Monte Carlo & MCMC

Inverse-CDF sampling here is the exact 1-D baseline before rejection, importance sampling, and MCMC.

Processes

Poisson Processes

Turn the Poisson count and exponential wait distributions into one continuous-time arrival model.

Bayes

Free Energy & Variational Inference

Conjugate pairs show exact posterior updates; VI handles cases where those updates leave the named family.

Foundations

Measure Theory & Random Variables

Transformations and densities here are the concrete version of pushforwards and Radon-Nikodym derivatives.