Sample mean and covariance
From Wikipedia, the free encyclopedia
Sample mean and covariance are statistics computed from a collection of data, thought of as being random.
Contents |
Given a random sample
from an
-dimensional random variable
(i.e., (realizations of
independent random variables with the same distribution as
), the sample mean is
In coordinates, writing the vectors as columns,
the entries of the sample mean are
The sample covariance of
is the
by
matrix
with the entries given by
The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of the random variable
. The reason why the sample covariance matrix has
in the denominator rather than
is essentially that the mean is not known and is replaced by the sample mean
. If the mean
is known, the analogous unbiased estimate
with the exact mean indeed does have
. This is an example why in probability and statistics it is essential to distinguish between upper case letters (random variables) and lower case letters (realizations of the random variables).
The maximum likelihood estimate of the covariance
for the Gaussian distribution case has
as well. The difference of course diminishes for large
.
In a weighted sample, each vector
is assigned a weight
. Without loss of generality, assume that the weights are normalized:
(If they are not, divide the weights by their sum.) Then the weighted mean
and the weighted covariance matrix
are given by
and [1]
If all weights are the same,
, the weighted mean and covariance reduce to the sample mean and covariance above.
- ^ Mark Galassi, Jim Davies, James Theiler, Brian Gough, Gerard Jungman, Michael Booth, and Fabrice Rossi. GNU Scientific Library - Reference manual, Version 1.9, 2007. Sec. 20.6 Weighted Samples

![\mathbf{x}_{k}=\left[ \begin{array} [c]{c}x_{1k}\\ \vdots\\ x_{nk}\end{array} \right] ,\quad\mathbf{\bar{x}}=\left[ \begin{array} [c]{c}\bar{x}_{1}\\ \vdots\\ \bar{x}_{n}\end{array} \right] ,](../../../math/b/3/7/b37db4599ae81c80f0a3b13496992407.png)






