Named Distributions

A family tree of the usual distributions: limits, sums, transformations, ratios, tails, and conjugacy.

Named distributions are easier to remember when they are connected by operations. Bernoulli trials add into binomials; rare binomials limit to Poisson; Gaussians stay Gaussian under sums; squared Gaussians make $\chi^2$; ratios make Cauchy, $t$, and $F$ laws. This page shows those connections.

1. Opening figure: the family map

Click a node to jump to its card. The edge labels are the operations that turn one distribution into another: sums, limits, transformations, ratios, and conjugate updates.

Figure 1 · A small family map of named distributions
Click a node to scroll to its card.

2. Discrete starters

discreteBernoulli → Binomial

One trial becomes a count.

A Bernoulli variable is a single yes/no event. A binomial is the sum of $n$ independent Bernoulli trials with the same success probability $p$.

Drag $n$ and $p$: the bars are $P(X=k)$ for $X\sim\mathrm{Binomial}(n,p)$. The center tracks $np$ and the spread tracks $np(1-p)$.

LawPMFCDFCFMeanVar
Bernoulli$p^x(1-p)^{1-x}$$0,1-p,1$$1-p+pe^{it}$$p$$p(1-p)$
Binomial$\binom{n}{k}p^k(1-p)^{n-k}$$\sum_{j\le k}\binom{n}{j}p^j(1-p)^{n-j}$$(1-p+pe^{it})^n$$np$$np(1-p)$

discreteGeometric

Waiting for the first success.

The geometric distribution counts trials until the first success. After failures, the remaining wait still has the same distribution: discrete memoryless waiting time.

The expectation follows from the tail sum:

\[\mathbb{E}X=\sum_{k\ge0}P(X>k)=\sum_{k\ge0}(1-p)^k=1/p.\]

PMF$p(1-p)^{k-1}$ for $k\ge1$
CDF$1-(1-p)^k$
CF$\frac{pe^{it}}{1-(1-p)e^{it}}$
Mean / variance$1/p$, $(1-p)/p^2$
FactDiscrete memoryless waiting time.

discretePoisson as a rare-event limit

Many tiny chances become a rate.

If $X_n\sim\mathrm{Binomial}(n,\lambda/n)$, then as $n\to\infty$ the mass approaches $\mathrm{Poisson}(\lambda)$. This is the count model for many independent rare opportunities.

Poisson processes are the process-level version: counts over time windows plus exponential waiting times.

The plot overlays the binomial bars with the limiting Poisson curve.

PMF$e^{-\lambda}\lambda^k/k!$
CDF$\sum_{j\le k}e^{-\lambda}\lambda^j/j!$
CF$\exp(\lambda(e^{it}-1))$
Mean / variance$\lambda$, $\lambda$
FactIndependent Poisson counts add by adding rates.

discreteNegative Binomial

Waiting for several successes, or overdispersed counts.

A negative binomial can be seen as a sum of geometric waits. As a count model, it is what you reach for when Poisson variance is too small for the data.

The gray curve is a Poisson with the same mean. The wider bars show the extra dispersion.

PMF$\binom{k+r-1}{k}p^r(1-p)^k$
CDF$\sum_{j\le k}\binom{j+r-1}{j}p^r(1-p)^j$
CF$\left(\frac{p}{1-(1-p)e^{it}}\right)^r$
Mean / variance$r(1-p)/p$, $r(1-p)/p^2$
FactVariance-to-mean ratio is $1/p$, so it exceeds Poisson when $p<1$.

3. Continuous starters

continuousUniform and inverse-CDF sampling

The random-number source behind the others.

If $U\sim\mathrm{Uniform}(0,1)$, then $F^{-1}(U)$ has CDF $F$. This is the inverse-CDF method behind exact one-dimensional sampling.

The left axis is uniform probability. The curve is $F^{-1}(u)$ for the chosen family, turning evenly spaced $u$ values into nonuniform samples.

Sampling methods builds this into rejection and importance sampling.

PDF$1/(b-a)$ on $[a,b]$
CDF$(x-a)/(b-a)$ on $[a,b]$
CF$\frac{e^{itb}-e^{ita}}{it(b-a)}$
Mean / variance$(a+b)/2$, $(b-a)^2/12$
FactThe source law behind inverse-CDF sampling.

continuousExponential

Memoryless waiting in continuous time.

The exponential distribution is the continuous waiting time with no aging:

\[P(T>s+t\mid T>s)=P(T>t).\]

It is also the interarrival-time distribution in a Poisson process.

The minimum of independent exponentials is exponential again: $\min(T_1,T_2)\sim\mathrm{Exp}(\lambda_1+\lambda_2)$. Competing clocks add rates.

PDF$\lambda e^{-\lambda x}$ for $x\ge0$
CDF$1-e^{-\lambda x}$
CF$\lambda/(\lambda-it)$
Mean / variance$1/\lambda$, $1/\lambda^2$
FactThe only continuous memoryless distribution.

continuousGaussian

The shape sums with finite variance converge to.

Gaussian distributions are closed under sums: independent normals add to a normal. More broadly, normalized sums of many finite-variance variables drift toward a bell curve.

The bars show the sum of $m$ centered uniform variables rescaled to variance $\sigma^2$. The curve is the matching normal approximation.

PDF$\frac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/(2\sigma^2)}$
CDF$\Phi((x-\mu)/\sigma)$
CF$e^{i\mu t-\sigma^2t^2/2}$
Mean / variance$\mu$, $\sigma^2$
FactClosed under independent sums.
Figure 3b · Galton board: binomial paths become a bell curve
falling balls normal approximation empirical histogram

continuousCauchy

A ratio distribution with no mean.

If $Z_1,Z_2$ are independent standard normals, then $Z_1/Z_2$ is Cauchy. Its tails are so heavy that the mean does not exist.

The plot shows the theoretical Cauchy density against a normal density. The Cauchy peak is lower and its tails decay much more slowly.

PDF$\frac{1}{\pi\gamma[1+((x-x_0)/\gamma)^2]}$
CDF$\frac{1}{\pi}\arctan\frac{x-x_0}{\gamma}+\frac{1}{2}$
CF$e^{ix_0t-\gamma|t|}$
Mean / varianceUndefined; undefined.
Fact$Z_1/Z_2$ for independent standard normals.

4. Operations that make new distributions

operationSums and convolution

Adding independent variables convolves their densities.

The density of $X+Y$ is $f_X*f_Y$. Some families are closed under addition: Gaussian plus Gaussian is Gaussian; Poisson plus Poisson is Poisson; Cauchy plus Cauchy is Cauchy.

The plot shows two draggable family choices and their sum. A future MGF/CF page can explain the algebra behind this closure.

operationMax and min of i.i.d. samples

Extremes act on the CDF, not the density.

If $M_n=\max(X_1,\dots,X_n)$, then:

\[P(M_n\le x)=F(x)^n.\]

If $m_n=\min(X_1,\dots,X_n)$, then:

\[P(m_n\le x)=1-(1-F(x))^n.\]

The plot uses a $\mathrm{Uniform}(0,1)$ base distribution and shows how increasing $n$ pushes mass toward the right edge for maxima and the left edge for minima.

operationTransformations and Jacobians

Changing variables reshapes density by the derivative.

For a monotone transform:

\[f_Y(y)=f_X(g^{-1}(y))\left|\frac{d}{dy}g^{-1}(y)\right|.\]

Non-monotone transforms sum over preimages.

Two canonical cases: $Y=X^2$ turns $N(0,1)$ into $\chi^2_1$; $Y=e^X$ turns a normal into a log-normal.

operationRatios

Division creates heavy tails.

The Cauchy distribution is what you get when you divide one standard normal by another. Near-zero denominators create enormous ratios.

This is the ratio story behind the $t$ and $F$ sampling distributions too: normalize by an estimated scale, and tail weight appears.

5. Sampling distributions from Gaussians

continuous$\chi^2$

Sum squared standard normals.

If $Z_i\sim N(0,1)$ independently, then $\sum_{i=1}^k Z_i^2\sim\chi^2_k$. It is the distribution of squared Gaussian length in $k$ dimensions.

PDF$x^{k/2-1}e^{-x/2}/(2^{k/2}\Gamma(k/2))$
CDF$P(k/2,x/2)$
CF$(1-2it)^{-k/2}$
Mean / variance$k$, $2k$
FactGamma with shape $k/2$ and rate $1/2$.

continuousStudent's t

A normal divided by estimated scale.

If $Z\sim N(0,1)$ and $V\sim\chi^2_\nu$, then $T=Z/\sqrt{V/\nu}$ has Student's $t_\nu$ distribution. As $\nu$ grows, the denominator stabilizes and $t$ becomes Gaussian.

PDF$\frac{\Gamma((\nu+1)/2)}{\sqrt{\nu\pi}\Gamma(\nu/2)}(1+x^2/\nu)^{-(\nu+1)/2}$
CDFStandard $t_\nu$ special-function CDF.
CFSpecial-function form; no compact elementary expression.
Mean / variance$0$ for $\nu>1$; $\nu/(\nu-2)$ for $\nu>2$
FactHeavy-tailed because the scale is estimated.

continuousF distribution

Ratio of scaled chi-squares.

If $U\sim\chi^2_{d_1}$ and $V\sim\chi^2_{d_2}$ independently, then $(U/d_1)/(V/d_2)$ has an $F_{d_1,d_2}$ distribution. It appears in variance comparisons and ANOVA-style ratios.

PDF$\frac{(d_1/d_2)^{d_1/2}x^{d_1/2-1}}{B(d_1/2,d_2/2)(1+d_1x/d_2)^{(d_1+d_2)/2}}$
CDF$I_{d_1x/(d_1x+d_2)}(d_1/2,d_2/2)$
CFSpecial-function form; no compact elementary expression.
Mean / variance$d_2/(d_2-2)$; finite variance for $d_2>4$
FactRatio of independent variance estimates.

6. Conjugate pairs for the Bayesian thread

Conjugacy means the posterior stays in the same family as the prior. These pairs are exact-update shortcuts, and reference points for variational inference when exact updating is not available.

Why conjugacy exists. All four pairs below share a single structural property: the likelihood is an exponential family $p(x\mid\theta) = h(x)\exp\bigl(\eta(\theta)\cdot T(x) - A(\theta)\bigr)$, and the prior has the same exponential-family shape in $\theta$. Multiplying prior by likelihood and absorbing the result back into the same form gives a posterior whose natural parameter is just a shift in the sufficient-statistic direction. Concretely: seeing data shifts $\alpha$ by $\sum T(x_i)$. These are exactly the families with finite-dimensional sufficient statistics. The data influences the posterior only through the fixed-size summary $\sum T(x_i)$, which is why the update is a simple parameter shift. Conjugacy is the cases where the posterior's tilt stays inside a finite-dimensional family. When that's not true, you reach for variational inference.
FamilySufficient statistic $T(x)$Natural parameter shift on update
Bernoulli / Binomial$x$ (count of successes)$\alpha\to\alpha+s,\;\beta\to\beta+f$
Poisson$x$ (count)$\alpha\to\alpha+\sum y_i,\;\beta\to\beta+t$
Normal (known $\sigma^2$)$x$precision adds; mean is precision-weighted
Categorical / Multinomial$(\mathbb 1_{x=k})_k$$\alpha_k\to\alpha_k+n_k$
Named distributions as maximum-entropy answers. Most of the distributions on this page are not arbitrary mathematical objects — they are the unique distributions that maximize entropy given a particular support and moment constraint. Same set of facts, different routes:
SupportConstraints (beyond normalization)Max-entropy law
$[a, b]$noneUniform$(a,b)$
$[0, \infty)$fixed mean $1/\lambda$Exponential$(\lambda)$
$\mathbb{R}$fixed mean $\mu$, variance $\sigma^2$Normal$(\mu, \sigma^2)$
$\{0,1,\dots,N\}$fixed meanDiscrete exp-family (Binomial-like)
$\{0,1,2,\dots\}$fixed mean $\mu$Geometric, $q = \mu/(\mu+1)$
$\{0,1,2,\dots\}$fixed mean & varianceNegative-binomial-family
$\mathbb{R}^d$fixed mean & covarianceMultivariate Normal

The pattern is the same in every row: write down the Lagrangian $-\int q\log q - \sum_i \lambda_i(\int T_i q - c_i)$, take the variation, and the stationary $q^*\propto\exp(\sum_i\lambda_i T_i)$ is exactly an exponential family with $T_i$ as sufficient statistics. See the max-entropy interactive on the Fisher-information page to step through the constraints and watch the family member emerge, and the Legendre-duality section for why this construction is forced by the geometry of $\log\sum e^{\eta T}$.

bayesBeta-Binomial

Prior over a Bernoulli/binomial probability. Successes add to $\alpha$; failures add to $\beta$.

Beta PDF$x^{\alpha-1}(1-x)^{\beta-1}/B(\alpha,\beta)$
CDF$I_x(\alpha,\beta)$
CFSpecial-function form.
Mean / variance$\alpha/(\alpha+\beta)$; $\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$
Update$\alpha'=\alpha+s$, $\beta'=\beta+f$

bayesGamma-Poisson

Prior over a Poisson rate. Counts add to shape; exposure adds to rate.

Gamma PDF$\beta^\alpha x^{\alpha-1}e^{-\beta x}/\Gamma(\alpha)$
CDF$P(\alpha,\beta x)$
CF$(1-it/\beta)^{-\alpha}$
Mean / variance$\alpha/\beta$, $\alpha/\beta^2$
Update$\alpha'=\alpha+\sum y_i$, $\beta'=\beta+t$

bayesNormal-Normal

Known variance Gaussian observations update a Gaussian prior by precision-weighted averaging.

Normal PDF$\frac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/(2\sigma^2)}$
CDF / CF$\Phi((x-\mu)/\sigma)$; $e^{i\mu t-\sigma^2t^2/2}$
Mean / variance$\mu$, $\sigma^2$
UpdatePosterior precision is prior precision plus data precision.

bayesDirichlet-Multinomial

Prior over category probabilities. Each observed category increments its corresponding concentration.

Dirichlet PDF$\frac{1}{B(\alpha)}\prod_i x_i^{\alpha_i-1}$
CDF / CFNo compact elementary form.
Mean$\mathbb E[X_i]=\alpha_i/\alpha_0$
Variance$\alpha_i(\alpha_0-\alpha_i)/(\alpha_0^2(\alpha_0+1))$
Update$\alpha_i'=\alpha_i+n_i$

7. Heavy tails

tailPower law / Pareto

Slow tail decay changes which moments exist.

The Pareto tail has $P(X>x)\propto x^{-\alpha}$. On log-log axes it becomes a straight line, while exponential and Gaussian tails curve downward much faster. Smaller $\alpha$ means heavier tails and fewer finite moments.

Cauchy has no mean or variance. Pareto has finite moments only for orders below $\alpha$: mean needs $\alpha>1$, variance needs $\alpha>2$.

PDF$\alpha x_m^\alpha/x^{\alpha+1}$ for $x\ge x_m$
CDF$1-(x_m/x)^\alpha$
CFSpecial-function form; no compact elementary expression.
Mean$\alpha x_m/(\alpha-1)$ for $\alpha>1$
Variance$\alpha x_m^2/((\alpha-1)^2(\alpha-2))$ for $\alpha>2$

8. Decision table

Start from what you are modeling; pick the distribution whose construction matches that story.

What you are modelingReach forWhy
Single yes/no eventBernoulliOne trial with success probability $p$.
Number of successes in fixed trialsBinomialSum of independent Bernoulli trials.
Rare event counts in fixed exposurePoissonLimit of many tiny independent chances; one rate parameter.
Counts with variance larger than the meanNegative BinomialPoisson-like count with extra dispersion.
Waiting time until an eventGeometric / ExponentialDiscrete or continuous memoryless waiting.
Measurement noise from many small effectsGaussianStable under sums and central-limit behavior.
Ratio of two noisy measurementsCauchy / Student's tNear-zero denominators create heavy tails.
Squared Gaussian length or variance componentChi-squareSum of squared standard normals.
Ratio of variance estimatesFRatio of scaled chi-square variables.
Bounded proportion or probabilityBeta (see Beta-Binomial)Flexible distribution on $[0,1]$; conjugate to binomial data.
Positive rate or scaleGammaPositive support; conjugate to Poisson rates.
Extreme values or tail riskPareto / extreme-value familyTail behavior is the main object, not the center.

What next