Pearson product-moment correlation coefficient

From Wikipedia, the free encyclopedia

Jump to: navigation, search

In statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the MCV or PMCC) (r) is a common measure of the correlation between two variables X and Y. When measured in a population the Pearson Product Moment correlation is designated by the Greek letter rho (ρ). When computed in a sample, it is designated by the letter "r" and is sometimes called "Pearson's r." Pearson's correlation reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. A correlation of -1 means that there is a perfect negative linear relationship between variables. A correlation of 0 means there is no linear relationship between the two variables. Correlations are rarely if ever 0, 1, or -1.

The statistic is defined as the sum of the products of the standard scores of the two measures divided by the degrees of freedom:

 r = \frac {\sum z_x z_y}{n - 1}.

Note that this formula assumes the Z scores are calculated using standard deviations which are calculated using n − 1 in the denominator.

The result obtained is equivalent to dividing the covariance between the two variables by the product of their standard deviations.

The coefficient ranges from −1 to 1. A value of 1 shows that a linear equation describes the relationship perfectly and positively, with all data points lying on the same line and with Y increasing with X. A score of −1 shows that all data points lie on a single line but that Y increases as X decreases. A value of 0 shows that a linear model is inappropriate – that there is no linear relationship between the variables.

The Pearson coefficient is a statistic which estimates the correlation of the two given random variables.

The linear equation that best describes the relationship between X and Y can be found by linear regression. This equation can be used to "predict" the value of one measurement from knowledge of the other. That is, for each value of X the equation calculates a value which is the best estimate of the values of Y corresponding the specific value. We denote this predicted variable by Y'.

Any value of Y can therefore be defined as the sum of Y′ and the difference between Y and Y′:

Y = Y^\prime + (Y - Y^\prime).

The variance of Y is equal to the sum of the variance of the two components of Y:

s_y^2 = S_{y^\prime}^2 + s^2_{y.x}.

Since the coefficient of determination implies that sy.x2 = sy2(1 − r2) we can derive the identity

r^2 = {s_{y^\prime}^2 \over s_y^2}.

The square of r is conventionally used as a measure of the association between X and Y. For example, if the coefficient is 0.90, then 81% of the variance of Y can be "accounted for" by changes in X and the linear relationship between X and Y.

  • The CORREL() function in many major spreadsheet packages, such as Microsoft Excel, OpenOffice.org Calc and Gnumeric calculates Pearson's correlation coefficient. Note that versions of Excel prior to 2003 exhibited rounding errors in this function and others [1].
  • The PEARSON() function in Microsoft Excel also calculates Pearson's correlation coefficient.
  • In MATLAB and Minitab, corr(X) calculates Pearsons correlation coefficient along with p-value.
    • In MATLAB, scilab, and GNU Octave corrcoef calculates Pearsons correlation coefficient.
  • In S-Plus and R, cor.test(X,Y) calculates Pearson's correlation coefficient.
R = corrcoef(X) returns a matrix R of correlation coefficients calculated from an input matrix X whose rows are observations and whose columns are variables.
  • In IDL, the CORRELATE() function computes the PMCC.



Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.