Distribution of a Jointly Gaussian random vector conditioned on observing the sum of entries

This is a follow-up to the previous post on conditioning a jointly Gaussian probability distribution on partial observations

[Note: If the math notation is not rendering correctly, try following the steps described here to set your MathJax renderer to “Common HTML”.]

Setting

Let $X$ be a jointly Gaussian (j-g) random vector with mean $\mu\in\mathbf{R}^n$ and covariance matrix $\Sigma\in\mathbf{R}^{n\times n}$, such that $X\sim\mathcal{N}(\mu, \Sigma)$, and let $S$ be the sum of the entries of $X$. We are interested in the setting where we wish to update our belief about the distribuition of $X$ given our observation that the sum equals a specific value, $S=s$.

Because $S$ is a linear transform of j-g randon variables, $S$ is also itself Gaussian. As in the prior post, we will exploit the unique property of Gaussians that uncorrelated variables are also independent.

There is a Marimo notebook (app, editable code) accompanying this post that allows you to play with some relevent numerical experiments. It runs in your browser, so feel free to edit and play around!

The Formula

The distribution of $X$ given the observation that the sum $S=s$ is given by

\[X\mid S=s \sim \mathcal{N}\left(sv + A\mu, A\Sigma A^T\right),\]

where,

\[v = \frac{1}{\mathbf{1}^T \Sigma \mathbf{1}}\Sigma\mathbf{1},\] \[A=I-v\mathbf{1}^T,\]

and $\mathbf{1}\in\mathbf{R}^n$ is the ones vector.

Proof

As asserted above, $(X, S)$ are j-g, and so $(AX,S)$ are similarly j-g by construction for any matrix $A\in\mathbf{R}^{n\times n}$. We will find matrix $A$ and vector $v\in\mathbf{R}^n$ such that

  1. $AX$ is independent from $S$, and
  2. $X=AX + Sv$.

If we are able to do this, then the formula above follows from basic definitions.

$AX$ is independent from $S$ if and only if they are uncorrelated, i.e., their covariance matrix is zero:

\[E[A(X-\mu)(S-E[S])] = 0.\]

We know that $S=\mathbf{1}^TX$ and $E[S]=\mathbf{1}^T\mathbf{1}\mu$, so it follows that,

\[E[A(X-\mu)(X-\mu)^T\mathbf{1}] = A\Sigma\mathbf{1}=0.\]

Next, we oberve that

\[\begin{align*} X &=AX + Sv \\ AX &=X - Sv \\ &=X-\mathbf{1}^TXv \\ &= (I-v\mathbf{1}^T)X, \end{align*}\]

implying that $A=I-v\mathbf{1}^T$, thus proving the second result above. Multipying through by $\Sigma\mathbf{1}$ and noting that $A\Sigma\mathbf{1}=0$, we find

\[\begin{align*} 0 &= \Sigma\mathbf{1} - v\mathbf{1}^T\Sigma\mathbf{1} \\ v\mathbf{1}^T\Sigma\mathbf{1} &= \Sigma\mathbf{1} \\ v &= \frac{1}{\mathbf{1}^T \Sigma \mathbf{1}}\Sigma\mathbf{1}, \end{align*}\]

thus proving the first result.

Discussion

Note that both $v$ and $A$ do not depend at all on $s$ and can be pre-calculated. The entries of $v$ are all on the interval $[0,1]$ and, in fact, form a simplex (their values sum to $1$). The posterior mean is always updated to be exactly consistent with the observed sum.

The updated covariance matrix is always “shrunk,” i.e., $\Sigma - A\Sigma A^T \succeq 0$, so that the uncertainty is reduced in the posterior distribution. Assuming $\Sigma$ is rank $n$, $A$ has rank $n-1$, and the updated covariance becomes degenerate (singular), also with rank $n-1$. This degeneracy is important! It establishes a subspace in $\mathbf{R}^n$ (the nullspace of the updated covariance matrix) along which our posterior distribution has no variance. This subspace is one dimensional with the basis matrix, $\mathbf{1}\in\mathbf{R}^{n\times 1}$. That means the posterior distribution has no variability in the sum of entries. Any sample of this posterior distribution will have the exact same sum!

See also

https://math.stackexchange.com/a/2942689
https://stanford.edu/class/ee363/lectures/estim.pdf

Written on November 18, 2024