A family tree of the usual distributions: limits, sums, transformations, ratios, tails, and conjugacy.
Named distributions are easier to remember when they are connected by operations.
Bernoulli trials add into binomials; rare binomials limit to Poisson; Gaussians stay
Gaussian under sums; squared Gaussians make $\chi^2$; ratios make Cauchy, $t$, and
$F$ laws. This page shows those connections.
1. Opening figure: the family map
Click a node to jump to its card. The edge labels are the operations that turn one
distribution into another: sums, limits, transformations, ratios, and conjugate
updates.
Figure 1 · A small family map of named distributions
Click a node to scroll to its card.
2. Discrete starters
discreteBernoulli → Binomial
One trial becomes a count.
A Bernoulli variable is a single yes/no event. A binomial is the sum of $n$
independent Bernoulli trials with the same success probability $p$.
Drag $n$ and $p$: the bars are $P(X=k)$ for $X\sim\mathrm{Binomial}(n,p)$.
The center tracks $np$ and the spread tracks $np(1-p)$.
Law
PMF
CDF
CF
Mean
Var
Bernoulli
$p^x(1-p)^{1-x}$
$0,1-p,1$
$1-p+pe^{it}$
$p$
$p(1-p)$
Binomial
$\binom{n}{k}p^k(1-p)^{n-k}$
$\sum_{j\le k}\binom{n}{j}p^j(1-p)^{n-j}$
$(1-p+pe^{it})^n$
$np$
$np(1-p)$
discreteGeometric
Waiting for the first success.
The geometric distribution counts trials until the first success. After
failures, the remaining wait still has the same distribution: discrete
memoryless waiting time.
If $X_n\sim\mathrm{Binomial}(n,\lambda/n)$, then as $n\to\infty$ the mass
approaches $\mathrm{Poisson}(\lambda)$. This is the count model for many
independent rare opportunities.
Poisson processes are the
process-level version: counts over time windows plus exponential waiting times.
The plot overlays the binomial bars with the limiting Poisson curve.
PMF
$e^{-\lambda}\lambda^k/k!$
CDF
$\sum_{j\le k}e^{-\lambda}\lambda^j/j!$
CF
$\exp(\lambda(e^{it}-1))$
Mean / variance
$\lambda$, $\lambda$
Fact
Independent Poisson counts add by adding rates.
discreteNegative Binomial
Waiting for several successes, or overdispersed counts.
A negative binomial can be seen as a sum of geometric waits. As a count model,
it is what you reach for when Poisson variance is too small for the data.
The gray curve is a Poisson with the same mean. The wider bars show the
extra dispersion.
PMF
$\binom{k+r-1}{k}p^r(1-p)^k$
CDF
$\sum_{j\le k}\binom{j+r-1}{j}p^r(1-p)^j$
CF
$\left(\frac{p}{1-(1-p)e^{it}}\right)^r$
Mean / variance
$r(1-p)/p$, $r(1-p)/p^2$
Fact
Variance-to-mean ratio is $1/p$, so it exceeds Poisson when $p<1$.
3. Continuous starters
continuousUniform and inverse-CDF sampling
The random-number source behind the others.
If $U\sim\mathrm{Uniform}(0,1)$, then $F^{-1}(U)$ has CDF $F$. This is the
inverse-CDF method behind exact one-dimensional sampling.
The left axis is uniform probability. The curve is $F^{-1}(u)$ for the chosen
family, turning evenly spaced $u$ values into nonuniform samples.
Sampling methods builds this into
rejection and importance sampling.
PDF
$1/(b-a)$ on $[a,b]$
CDF
$(x-a)/(b-a)$ on $[a,b]$
CF
$\frac{e^{itb}-e^{ita}}{it(b-a)}$
Mean / variance
$(a+b)/2$, $(b-a)^2/12$
Fact
The source law behind inverse-CDF sampling.
continuousExponential
Memoryless waiting in continuous time.
The exponential distribution is the continuous waiting time with no aging:
\[P(T>s+t\mid T>s)=P(T>t).\]
It is also the interarrival-time distribution in a
Poisson process.
The minimum of independent exponentials is exponential again:
$\min(T_1,T_2)\sim\mathrm{Exp}(\lambda_1+\lambda_2)$. Competing clocks add rates.
PDF
$\lambda e^{-\lambda x}$ for $x\ge0$
CDF
$1-e^{-\lambda x}$
CF
$\lambda/(\lambda-it)$
Mean / variance
$1/\lambda$, $1/\lambda^2$
Fact
The only continuous memoryless distribution.
continuousGaussian
The shape sums with finite variance converge to.
Gaussian distributions are closed under sums: independent normals add to a
normal. More broadly, normalized sums of many finite-variance variables drift
toward a bell curve.
The bars show the sum of $m$ centered uniform variables rescaled to variance
$\sigma^2$. The curve is the matching normal approximation.
Adding independent variables convolves their densities.
The density of $X+Y$ is $f_X*f_Y$. Some families are closed under addition:
Gaussian plus Gaussian is Gaussian; Poisson plus Poisson is Poisson; Cauchy plus
Cauchy is Cauchy.
The plot shows two draggable family choices and their sum. A future MGF/CF
page can explain the algebra behind this closure.
operationMax and min of i.i.d. samples
Extremes act on the CDF, not the density.
If $M_n=\max(X_1,\dots,X_n)$, then:
\[P(M_n\le x)=F(x)^n.\]
If $m_n=\min(X_1,\dots,X_n)$, then:
\[P(m_n\le x)=1-(1-F(x))^n.\]
The plot uses a $\mathrm{Uniform}(0,1)$ base distribution and shows how
increasing $n$ pushes mass toward the right edge for maxima and the left edge
for minima.
operationTransformations and Jacobians
Changing variables reshapes density by the derivative.
Two canonical cases: $Y=X^2$ turns $N(0,1)$ into $\chi^2_1$; $Y=e^X$ turns a
normal into a log-normal.
operationRatios
Division creates heavy tails.
The Cauchy distribution is what you get when you divide one standard normal
by another. Near-zero denominators create enormous ratios.
This is the ratio story behind the $t$ and $F$ sampling distributions too:
normalize by an estimated scale, and tail weight appears.
5. Sampling distributions from Gaussians
continuous$\chi^2$
Sum squared standard normals.
If $Z_i\sim N(0,1)$ independently, then $\sum_{i=1}^k Z_i^2\sim\chi^2_k$.
It is the distribution of squared Gaussian length in $k$ dimensions.
PDF
$x^{k/2-1}e^{-x/2}/(2^{k/2}\Gamma(k/2))$
CDF
$P(k/2,x/2)$
CF
$(1-2it)^{-k/2}$
Mean / variance
$k$, $2k$
Fact
Gamma with shape $k/2$ and rate $1/2$.
continuousStudent's t
A normal divided by estimated scale.
If $Z\sim N(0,1)$ and $V\sim\chi^2_\nu$, then
$T=Z/\sqrt{V/\nu}$ has Student's $t_\nu$ distribution. As $\nu$ grows, the
denominator stabilizes and $t$ becomes Gaussian.
Special-function form; no compact elementary expression.
Mean / variance
$0$ for $\nu>1$; $\nu/(\nu-2)$ for $\nu>2$
Fact
Heavy-tailed because the scale is estimated.
continuousF distribution
Ratio of scaled chi-squares.
If $U\sim\chi^2_{d_1}$ and $V\sim\chi^2_{d_2}$ independently, then
$(U/d_1)/(V/d_2)$ has an $F_{d_1,d_2}$ distribution. It appears in variance
comparisons and ANOVA-style ratios.
Special-function form; no compact elementary expression.
Mean / variance
$d_2/(d_2-2)$; finite variance for $d_2>4$
Fact
Ratio of independent variance estimates.
6. Conjugate pairs for the Bayesian thread
Conjugacy means the posterior stays in the same family as the prior. These pairs
are exact-update shortcuts, and reference points for
variational inference when exact updating is
not available.
Why conjugacy exists. All four pairs below share a single structural
property: the likelihood is an
exponential family
$p(x\mid\theta) = h(x)\exp\bigl(\eta(\theta)\cdot T(x) - A(\theta)\bigr)$,
and the prior has the same exponential-family shape in $\theta$. Multiplying prior by
likelihood and absorbing the result back into the same form gives a posterior whose
natural parameter is just a shift in the sufficient-statistic direction. Concretely:
seeing data shifts $\alpha$ by $\sum T(x_i)$. These are exactly the families
with finite-dimensional sufficient statistics.
The data influences the posterior only through the fixed-size summary $\sum T(x_i)$,
which is why the update is a simple parameter shift. Conjugacy is the cases
where the posterior's tilt
stays inside a finite-dimensional family. When that's not true, you reach for
variational inference.
Family
Sufficient statistic $T(x)$
Natural parameter shift on update
Bernoulli / Binomial
$x$ (count of successes)
$\alpha\to\alpha+s,\;\beta\to\beta+f$
Poisson
$x$ (count)
$\alpha\to\alpha+\sum y_i,\;\beta\to\beta+t$
Normal (known $\sigma^2$)
$x$
precision adds; mean is precision-weighted
Categorical / Multinomial
$(\mathbb 1_{x=k})_k$
$\alpha_k\to\alpha_k+n_k$
Named distributions as maximum-entropy answers. Most of the
distributions on this page are not arbitrary mathematical objects — they are the
unique distributions that maximize entropy given a particular support and
moment constraint. Same set of facts, different routes:
Support
Constraints (beyond normalization)
Max-entropy law
$[a, b]$
none
Uniform$(a,b)$
$[0, \infty)$
fixed mean $1/\lambda$
Exponential$(\lambda)$
$\mathbb{R}$
fixed mean $\mu$, variance $\sigma^2$
Normal$(\mu, \sigma^2)$
$\{0,1,\dots,N\}$
fixed mean
Discrete exp-family (Binomial-like)
$\{0,1,2,\dots\}$
fixed mean $\mu$
Geometric, $q = \mu/(\mu+1)$
$\{0,1,2,\dots\}$
fixed mean & variance
Negative-binomial-family
$\mathbb{R}^d$
fixed mean & covariance
Multivariate Normal
The pattern is the same in every row: write down the Lagrangian
$-\int q\log q - \sum_i \lambda_i(\int T_i q - c_i)$, take the variation, and the
stationary $q^*\propto\exp(\sum_i\lambda_i T_i)$ is exactly an exponential family
with $T_i$ as sufficient statistics. See the max-entropy
interactive on the Fisher-information page to step through the constraints and
watch the family member emerge, and the Legendre-duality
section for why this construction is forced by the geometry of $\log\sum e^{\eta T}$.
bayesBeta-Binomial
Prior over a Bernoulli/binomial probability. Successes add to $\alpha$;
failures add to $\beta$.
The Pareto tail has $P(X>x)\propto x^{-\alpha}$. On log-log axes it becomes
a straight line, while exponential and Gaussian tails curve downward much faster.
Smaller $\alpha$ means heavier tails and fewer finite moments.
Cauchy has no mean or variance. Pareto has finite moments only for orders
below $\alpha$: mean needs $\alpha>1$, variance needs $\alpha>2$.
PDF
$\alpha x_m^\alpha/x^{\alpha+1}$ for $x\ge x_m$
CDF
$1-(x_m/x)^\alpha$
CF
Special-function form; no compact elementary expression.
Mean
$\alpha x_m/(\alpha-1)$ for $\alpha>1$
Variance
$\alpha x_m^2/((\alpha-1)^2(\alpha-2))$ for $\alpha>2$
8. Decision table
Start from what you are modeling; pick the distribution whose construction matches
that story.
What you are modeling
Reach for
Why
Single yes/no event
Bernoulli
One trial with success probability $p$.
Number of successes in fixed trials
Binomial
Sum of independent Bernoulli trials.
Rare event counts in fixed exposure
Poisson
Limit of many tiny independent chances; one rate parameter.
Counts with variance larger than the mean
Negative Binomial
Poisson-like count with extra dispersion.
Waiting time until an event
Geometric / Exponential
Discrete or continuous memoryless waiting.
Measurement noise from many small effects
Gaussian
Stable under sums and central-limit behavior.
Ratio of two noisy measurements
Cauchy / Student's t
Near-zero denominators create heavy tails.
Squared Gaussian length or variance component
Chi-square
Sum of squared standard normals.
Ratio of variance estimates
F
Ratio of scaled chi-square variables.
Bounded proportion or probability
Beta (see Beta-Binomial)
Flexible distribution on $[0,1]$; conjugate to binomial data.