Sample mean and covariance

From Wikipedia, the free encyclopedia

Sample mean and covariance are statistics computed from a collection of data, thought of as being random.

Contents

Given a random sample \textstyle \mathbf{x}_{1},\ldots,\mathbf{x}_{N} from an \textstyle n-dimensional random variable \textstyle \mathbf{X} (i.e., (realizations of \textstyle N independent random variables with the same distribution as \textstyle \mathbf{X}), the sample mean is

\mathbf{\bar{x}}=\frac{1}{N}\sum_{k=1}^{N}\mathbf{x}_{k}.

In coordinates, writing the vectors as columns,

\mathbf{x}_{k}=\left[ \begin{array} [c]{c}x_{1k}\\ \vdots\\ x_{nk}\end{array} \right]  ,\quad\mathbf{\bar{x}}=\left[ \begin{array} [c]{c}\bar{x}_{1}\\ \vdots\\ \bar{x}_{n}\end{array} \right]  ,

the entries of the sample mean are

\bar{x}_{i}=\sum_{k=1}^{N}x_{ik},\quad i=1,\ldots,n.

The sample covariance of \textstyle x_{1},\ldots,x_{N} is the \textstyle n by \textstyle n matrix \textstyle \mathbf{Q}=\left[  q_{ij}\right] with the entries given by

q_{ij}=\frac{1}{N-1}\sum_{k=1}^{N}\left(  x_{ik}-\bar{x}_{i}\right)  \left( x_{jk}-\bar{x}_{j}\right)

The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of the random variable \textstyle \mathbf{X}. The reason why the sample covariance matrix has \textstyle N-1 in the denominator rather than \textstyle N is essentially that the mean is not known and is replaced by the sample mean \textstyle\bar{x}. If the mean \textstyle\bar{X} is known, the analogous unbiased estimate

q_{ij}=\frac{1}{N}\sum_{k=1}^{N}\left(  x_{ik}-\bar{X}_{i}\right)  \left( x_{jk}-\bar{X}_{j}\right)

with the exact mean indeed does have \textstyle N. This is an example why in probability and statistics it is essential to distinguish between upper case letters (random variables) and lower case letters (realizations of the random variables).

The maximum likelihood estimate of the covariance

q_{ij}=\frac{1}{N}\sum_{k=1}^{N}\left(  x_{ik}-\bar{x}_{i}\right)  \left( x_{jk}-\bar{x}_{j}\right)

for the Gaussian distribution case has \textstyle N as well. The difference of course diminishes for large \textstyle N.

In a weighted sample, each vector \textstyle \textbf{x}_{k} is assigned a weight \textstyle w_{k}\geq0. Without loss of generality, assume that the weights are normalized:

\sum_{k=1}^{N}w_{k}=1.

(If they are not, divide the weights by their sum.) Then the weighted mean \textstyle \mathbf{\bar{x}} and the weighted covariance matrix \textstyle \mathbf{Q}=\left[  q_{ij}\right] are given by

\mathbf{\bar{x}}=\sum_{k=1}^{N}w_{k}\mathbf{x}_{k}

and [1]

q_{ij}=\frac{\sum_{k=1}^{N}w_{k}\left(  x_{ik}-\bar{x}_{i}\right)  \left( x_{jk}-\bar{x}_{j}\right)  }{1-\sum_{k=1}^{N}w_{k}^{2}}.

If all weights are the same, \textstyle w_{k}=1/N, the weighted mean and covariance reduce to the sample mean and covariance above.

  1. ^ Mark Galassi, Jim Davies, James Theiler, Brian Gough, Gerard Jungman, Michael Booth, and Fabrice Rossi. GNU Scientific Library - Reference manual, Version 1.9, 2007. Sec. 20.6 Weighted Samples

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.