Bayesian Graphical Models
A Bayesian network combines a directed acyclic graph $G$ with local conditional distributions $\Theta$. The graph says which variables are direct parents of each node; the probability model factorizes as $\prod_i p(x_i\mid \operatorname{pa}(x_i))$.
1. D-separation and explaining away
Click a node to mark it observed. Chains (A→B→C) and forks (A←B→C) are blocked by observing the middle variable; colliders (A→B←C) are opened by observing the common effect or any descendant of it. The matrix on the right shows every pairwise conditional independence given the current evidence: green for independent, red for dependent.
2. Dirichlet-multinomial CPT learning
For a discrete node, each row of a conditional probability table is a multinomial parameter. A Dirichlet prior acts like pseudo-counts. The posterior mean is $(\alpha_k+n_k)/(\sum_j\alpha_j+\sum_j n_j)$, so stronger priors move more slowly.
3. Linear Gaussian Bayesian network
When every node is a linear-Gaussian function of its parents, the joint distribution is multivariate Gaussian. For the chain $A\to B\to C$ with $B = \beta_1 A + \varepsilon_B$ and $C = \beta_2 B + \varepsilon_C$, the covariance $\Sigma$ becomes dense as the edge coefficients grow, but the precision $K=\Sigma^{-1}$ keeps the entry $K_{AC}=0$. Zeros in $K$ correspond exactly to conditional independencies given the rest; here, $A\perp C\mid B$.
Normalizing turns that contrast into a correlation. Pearson correlation is the normalized covariance, $r_{ij} = \Sigma_{ij}/\sqrt{\Sigma_{ii}\Sigma_{jj}}$, and partial correlation is the normalized negative precision, $\rho_{ij\cdot\text{rest}} = -K_{ij}/\sqrt{K_{ii}K_{jj}}$. The two answer different questions: $r_{AC}\neq 0$ because $A$ and $C$ are linked through the indirect path $A\!-\!B\!-\!C$, while $\rho_{AC\cdot B} = 0$ because conditioning on $B$ closes that path. For a Gaussian, $\rho_{ij\cdot\text{rest}} = 0 \iff K_{ij} = 0 \iff X_i \perp X_j \mid \text{rest}$: the partial-correlation zeros are exactly the missing edges of the Gaussian graphical model. Stripping indirect paths out of a correlation network is the standard use of partial correlation — Distance Correlation §7 pits it against Pearson and distance correlation on that task.
4. Structure search sandbox
Structure learning trades fit against complexity. The score below is BIC for a Gaussian linear regression at each node: $\sum_i \log p(\mathcal D_i \mid \operatorname{pa}(x_i),\hat w_i) - \tfrac12 k_i \log N$. Click an edge slot to toggle it; both directions are separate slots so you can reverse an edge. Cycles are rejected. The true generative DAG is shown faintly for comparison with greedy search.
What next
Static Bayesian networks connect to dynamic models, dependence measures, and Bayesian computation.