Distance Correlation
The 2007 paper introduced distance correlation, a scalar coefficient that is zero exactly when two random vectors are independent. Unlike Pearson’s $r$, it is not limited to linear dependence. The construction starts with pairwise distance matrices, subtracts row and column means to remove uninformative structure, and averages the entry-wise product.
The 2014 paper replaces the 2007 double-centering with U-centering. With that centering, the same bilinear form becomes a positive semidefinite inner product on a Hilbert space of centered matrices. That gives an unbiased estimator, lengths and angles, orthogonal projection for partial distance correlation, and a version that works for non-Euclidean dissimilarities.
1. What Pearson’s r misses
Pearson’s $r$ is a coefficient of linear dependence: it measures whether the cloud of points stretches along a diagonal. Symmetric nonlinear relationships, such as a parabola, a sine wave, or a circle, can have $r \approx 0$ while $X$ and $Y$ are tightly coupled, even deterministic. Rank-based generalizations (Spearman, Kendall) fix monotone misses but not the symmetric ones. The 2007 paper’s motivating quote: “distance correlation is zero only if the random vectors are independent.”
2. The 2007 definition
The population coefficient. Take a weighted $L^2$ distance between the joint characteristic function $\phi_{X,Y}$ and the product of marginals $\phi_X\phi_Y$:
The weight $w(t,s) \propto |t|^{-(1+p)} |s|^{-(1+q)}$ is non-integrable on purpose: integrable weights collapse this measure to $\rho^2$ in the small-signal limit and so cannot distinguish dependence from independence. Under finite first moments, the integral reduces to a compact expected-distance identity:
which is what the sample statistic estimates. Form the matrix of pairwise distances $a_{ij} = |X_i - X_j|$, double-center it (subtract row means, column means, add back the grand mean), do the same for $Y$, and take the entry-wise mean of the product:
Under finite first moments, $V^2(X,Y) = 0$ iff $X$ and $Y$ are independent, in arbitrary, possibly different, Euclidean dimensions. This is the 2007 paper’s headline theorem; everything else follows from it.
For one classical reference point: when $(X,Y)$ are bivariate standard normal with correlation $\rho$, distance correlation has a closed form (Theorem 7). It always lies below $|\rho|$, with the ratio $R(X,Y)/|\rho|$ bottoming out near $0.891$ as $\rho \to 0$:
3. Testing independence
The sample statistic $V_n^2$ is non-negative but has no clean closed-form null distribution. The 2007 paper proved that under independence and finite first moments, $nV_n^2$ converges in distribution to a quadratic form $\sum_j \lambda_j Z_j^2$ in i.i.d. standard normals, with eigenvalues that depend on the distribution of $(X,Y)$. Useful asymptotically, but distribution-dependent. The practical recommendation is a permutation test: hold $X$ fixed, shuffle the $Y$ values to break any real dependence, recompute the statistic, and see how often the shuffled value exceeds the observed one.
What it replaced
Before 2007, each standard way to test for dependence between two random vectors had its own narrow regime:
- Pearson’s $r$ and Wilks’ likelihood-ratio test. Optimal under Gaussianity. Heavy tails or nonlinear coupling break them; the 2007 paper shows Wilks’ LRT with inflated Type-I error on $t_1$-distributed data, and near-zero power on multiplicative-noise alternatives.
- Spearman’s $\rho$, Kendall’s $\tau$, Puri-Sen rank correlation. Distribution-free for monotone alternatives. Power flatlines on symmetric non-monotone dependence (parabolas, sinusoids, multiplicative noise), visible directly in § 1 above.
- The Mantel test (1967). Permutation correlation between two raw distance matrices. Widely used in ecology, but its statistic does not double-center the matrices, so it is not a consistent test of independence; it can return zero when $X$ and $Y$ are dependent. The 2014 paper compares its partial-dCor test directly against partial Mantel and dominates it.
- Hoeffding’s $D$, Blum-Kiefer-Rosenblatt. Genuine if-and-only-if independence coefficients via the joint vs. product CDFs, but defined only for bivariate continuous distributions. They do not extend to vectors in $\mathbb{R}^p \times \mathbb{R}^q$.
- Mutual information estimators (Kraskov et al. 2004, kernel-based). Also characterize independence in any dimension, but require bandwidth or k-neighbour tuning and don’t produce a single canonical scalar.
- HSIC (Gretton et al. 2005). A near-contemporaneous kernel-based independence criterion, conceptually very close: also an inner product of centered objects in a Hilbert space (an RKHS). The two were later shown to coincide for a particular distance-induced kernel (Sejdinovic et al. 2013).
What distance correlation offers is a single scalar in $[0,1]$ that is parameter-free, defined in arbitrary dimensions, characterizes independence exactly, has a tractable permutation test, and is competitive with the LRT in the Gaussian regime while dominating it elsewhere.
- The inner product is an unbiased estimator of $V^2(X,Y)$ (§ 4 below).
- Length, angle, orthogonality, and orthogonal projection become legitimate operations, and partial distance correlation is the cosine of the angle between residuals after projecting out a third matrix (§ 5).
- The inner product is invariant to additive shifts of the underlying dissimilarities, so any symmetric zero-diagonal dissimilarity matrix, not just Euclidean distances, plugs straight in (§ 6).
4. The unbiased estimator
Where the double-centered estimator $V_n^2$ divides row and column sums by $n$ and the grand sum by $n^2$, U-centering uses $n-2$ and $(n-1)(n-2)$, the “leave-one-out” counts that make the expectation algebra come out cleanly:
Diagonal entries are set to zero. Proposition 1: $(\widetilde A \cdot \widetilde B)$ is an unbiased estimator of the population $V^2(X,Y)$. Below: a Monte Carlo comparing the 2007 estimator with its 2014 replacement as $n$ increases.
With $X \perp Y$ the truth is exactly zero, and the bias of $V_n^2$ is most visible: positive, decaying like $1/n$. The unbiased estimator fluctuates symmetrically around zero (it can be negative; remember, it is an inner product, not a squared length).
5. The Hilbert-space picture
The U-centered matrices live in a Hilbert space $H_n$ with the inner product above. Once we have an inner product, every concept from Euclidean geometry (lengths, angles, orthogonal projections) transfers automatically. Partial distance correlation is just the standard geometric construction:
- Form U-centered matrices $\widetilde A, \widetilde B, \widetilde C$ from $X, Y, Z$.
- Project $\widetilde A$ and $\widetilde B$ onto the orthogonal complement of $\widetilde C$. Call the residuals $P_{Z^\perp}(\widetilde A) = \widetilde A - \alpha\widetilde C$ and $P_{Z^\perp}(\widetilde B) = \widetilde B - \beta\widetilde C$.
- $R^*(X,Y;Z)$ is the cosine of the angle between the residuals.
The picture below is drawn from actual U-centered inner products: vector lengths and pairwise angles all reflect what the math says. Slide the controls to reshape the data and watch the residual angle, the partial distance correlation, change.
With direct link at zero, $X$ and $Y$ share only their $Z$ pathway: $\widetilde A$ and $\widetilde B$ both lean toward $\widetilde C$, and once that shared component is subtracted, the residuals are nearly perpendicular, with $R^*(X,Y;Z) \approx 0$. Move the direct slider, and the residuals swing back into alignment. The paper notes that $R^*(X,Y;Z) = 0$ does not in general imply conditional independence; it characterises orthogonality in $H_n$, which is strictly weaker.
The familiar partial-correlation formula falls out as Proposition 2 of the paper, a direct consequence of the projection geometry. Note that here $R^*$ plays the role that squared dCor plays elsewhere, so its values are on the scale of $dCor^2$ and can be negative.
6. Beyond Euclidean distances
In ecology, genetics, and psychometrics, “dissimilarities” often violate the triangle inequality; Bray-Curtis on species counts is a standard example. The paper’s second contribution: distance-correlation methods still work, because U-centering only sees the inner product, not the original dissimilarities.
Two facts make this go. Theorem 2: every element of $H_n$ is the U-centered distance matrix of some Euclidean point configuration in $\mathbb{R}^p$ ($p \le n - 2$), recoverable via classical multidimensional scaling. Lemma 1(iv): U-centering is invariant to adding a constant $c$ to every off-diagonal dissimilarity (a “Cailliez constant” commonly used to force Euclidean embedding). The recovered MDS configuration moves around as $c$ changes, but the inner product, and hence every dCor statistic, does not.
7. Network recovery: Pearson, partial, and distance correlation
Bayesian-network skeleton recovery from observational data is a classic use of correlation measures. Here data come from a known 4-node DAG: $A\to B$, $A\to C$, $B\to D$, $C\to D$. Marginally $A$ and $D$ are correlated through their two paths even though there is no direct edge between them. Pairwise Pearson correlation puts an edge wherever any two variables co-vary, so it cannot tell the indirect $A\!-\!D$ path from the direct ones. Partial correlation conditions on the other variables (the precision matrix, the inverse covariance) and recovers the direct-edge structure. Distance correlation has stronger power against non-linear dependence but, like raw Pearson, does not condition out indirect paths.
Further reading
- Szekely, Rizzo & Bakirov (2007), Measuring and testing dependence by correlation of distances — the original.
- Szekely & Rizzo (2014), Partial Distance Correlation with Methods for Dissimilarities — this page.
- R package
energyby Rizzo & Szekely — reference implementation of all of the above.
What next
Other Math & Stats explainers connect distance-based dependence to information and computation.