Pearson product-moment correlation coefficient
From Wikipedia, the free encyclopedia
| This article may require cleanup to meet Wikipedia's quality standards. Please improve this article if you can. (October 2006) |
In statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the MCV or PMCC) (r) is a common measure of the correlation between two variables X and Y. When measured in a population the Pearson Product Moment correlation is designated by the Greek letter rho (ρ). When computed in a sample, it is designated by the letter "r" and is sometimes called "Pearson's r." Pearson's correlation reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. A correlation of -1 means that there is a perfect negative linear relationship between variables. A correlation of 0 means there is no linear relationship between the two variables. Correlations are rarely if ever 0, 1, or -1.
The statistic is defined as the sum of the products of the standard scores of the two measures divided by the degrees of freedom:
Note that this formula assumes the Z scores are calculated using standard deviations which are calculated using n − 1 in the denominator.
The result obtained is equivalent to dividing the covariance between the two variables by the product of their standard deviations.
The coefficient ranges from −1 to 1. A value of 1 shows that a linear equation describes the relationship perfectly and positively, with all data points lying on the same line and with Y increasing with X. A score of −1 shows that all data points lie on a single line but that Y increases as X decreases. A value of 0 shows that a linear model is inappropriate – that there is no linear relationship between the variables.
The Pearson coefficient is a statistic which estimates the correlation of the two given random variables.
The linear equation that best describes the relationship between X and Y can be found by linear regression. This equation can be used to "predict" the value of one measurement from knowledge of the other. That is, for each value of X the equation calculates a value which is the best estimate of the values of Y corresponding the specific value. We denote this predicted variable by Y'.
Any value of Y can therefore be defined as the sum of Y′ and the difference between Y and Y′:
The variance of Y is equal to the sum of the variance of the two components of Y:
Since the coefficient of determination implies that sy.x2 = sy2(1 − r2) we can derive the identity
The square of r is conventionally used as a measure of the association between X and Y. For example, if the coefficient is 0.90, then 81% of the variance of Y can be "accounted for" by changes in X and the linear relationship between X and Y.
- The
CORREL()function in many major spreadsheet packages, such as Microsoft Excel, OpenOffice.org Calc and Gnumeric calculates Pearson's correlation coefficient. Note that versions of Excel prior to 2003 exhibited rounding errors in this function and others [1]. - The
PEARSON()function in Microsoft Excel also calculates Pearson's correlation coefficient. - In MATLAB and Minitab,
corr(X)calculates Pearsons correlation coefficient along with p-value.- In MATLAB, scilab, and GNU Octave
corrcoefcalculates Pearsons correlation coefficient.
- In MATLAB, scilab, and GNU Octave
- In S-Plus and R,
cor.test(X,Y)calculates Pearson's correlation coefficient.
- R = corrcoef(X) returns a matrix R of correlation coefficients calculated from an input matrix X whose rows are observations and whose columns are variables.
- In IDL, the CORRELATE() function computes the PMCC.



