Degrees of freedom (statistics)

From Wikipedia, the free encyclopedia

Jump to: navigation, search
For other senses of these terms, see degrees of freedom or degree.

Degrees of freedom is the number of categories or classes being tested minus 1.

In statistics, the term degrees of freedom has two distinct senses.

Contents

In fitting statistical models to data, the vectors of residuals are often constrained to lie in a space of smaller dimension than the number of components in the vector. That smaller dimension is the number of degrees of freedom for error.

Perhaps the simplest example is this. Suppose

X_1,\dots,X_n\,

are random variables each with expected value μ, and let

\overline{X}_n={X_1+\cdots+X_n \over n}

be the "sample mean". Then the quantities

X_i-\overline{X}_n\,

are residuals that may be considered estimates of the errors Xi − μ. The sum of the residuals (unlike the sum of the errors) is necessarily 0. That means they are constrained to lie in a space of dimension n − 1. If one knows the values of any n − 1 of the residuals, one can thus find the last one. One says that "there are n − 1 degrees of freedom for error."

An only slightly less simple example is that of least squares estimation of a and b in the model

Y_i=a+bx_i+\varepsilon_i\ \mathrm{for}\ i=1,\dots,n

where εi, and hence Yi are random. Let \widehat{a} and \widehat{b} be the least-squares estimates of a and b. Then the residuals

e_i=y_i-(\widehat{a}+\widehat{b}x_i)\,

are constrained to lie within the space defined by the two equations

e_1+\cdots+e_n=0,\,
x_1 e_1+\cdots+x_n e_n=0.\,

One says that there are n − 2 degrees of freedom for error.

The capital Y is used in specifying the model, and lower-case y in the definition of the residuals. That is because the former are hypothesized random variables and the latter are data.

Another simple and frequently seen example arises in multiple comparisons.


Interpretation degree of freedom (df) in regression when n is sample size


1. df = n-2 for linear regression, because two points are needed to draw a straight line, this means 2 out of n points are lying on the straight line, the rest n-2 are lying around the straight line, this means standard error or aggregate fluctuations around the line are because of n-2 points, so df = n -2 for linear regression


2. df = n-3 for cubic regression, because three points are needed to draw a cubic curve, this means 3 out of n points are lying on the curve, the rest n-3 are lying around the curve, this means standard error or aggregate fluctuations around the curve are because of n-2 points, so df = n -3 for cubic regression

The probability distributions of residuals are often parametrized by these numbers of degrees of freedom. Thus one speaks of a chi-square distribution with a specified number of degrees of freedom, an F-distribution, a Student's t-distribution, or a Wishart distribution with specified numbers of degrees of freedom in the numerator and the denominator respectively.

In the familiar uses of these distributions, the number of degrees of freedom takes only integer values. The underlying mathematics in most cases allows for fractional degrees of freedom, which can arise in more sophisticated uses.

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.