Selection bias

From Wikipedia, the free encyclopedia

(Redirected from Selection effect)
Jump to: navigation, search

Selection bias is a distortion of evidence or data that arises from the way that the data are collected. It is sometimes referred to as the selection effect. The term selection bias most often refers to the distortion of a statistical analysis, due to the method of collecting samples. If the selection bias is not taken into account then any conclusions drawn may be wrong.

Contents

Sample selection may involve pre- or post-selecting the samples that may preferentially include or exclude certain kinds of results. Typically this causes measures of statistical significance to appear much stronger than they are, but it is also possible to cause completely illusory artifacts. Selection bias can be the result of scientific fraud which manipulate data directly, but more often is either unconscious or due to biases in the instruments used for observation.

For example, when film photography was used in astronomy, observations typically found more blue galaxies than red ones. This was not because blue galaxies are actually more common, but rather because photographic film was more sensitive to blue light than red light. With the conversion of astronomy to digital cameras, which are more sensitive to red light than blue, the opposite bias is now the case.

As another example: If an experiment were to be conducted to count the distribution of sizes of fish in a lake, a net might be used to catch a representative sample of fish. If the net had a mesh size of 1 cm, then no fish narrower than 1 cm wide would be found in the sample. This is a result of the method of selection: there is no way of knowing whether there are any fish smaller than 1 cm based on an experiment using that net.

To determine in a particular setting whether there is selection bias or not, it is not sufficient to establish that there has been selection. Instead, one must establish that the quantity of interest (fish size, for example) is systematically different in the sample than in the entire population of interest, as the selection procedure may simultaneously lead to bias in one quantity such as the fish size, but not in another, for example the sex ratio of the fish.

There are many types of possible selection bias, including:

  • Selecting end-points of a series. For example, to maximise a claimed trend, you could start the time series at an unusually low year, and end on a high one.
  • Early termination of a trial at a time when its results support a desired conclusion.
  • A trial may be terminated early at an extreme value (often for ethical reasons), but the extreme value is likely to be reached by the variable with the largest variance, even if all variables have a similar mean. As a result of that early termination, therefore, the means of variables with larger variances are overestimated.
  • Partitioning data with knowledge of the contents of the partitions, and then analyzing them with tests designed for blindly chosen partitions (see stratified sampling, cluster sampling, Texas sharpshooter fallacy).
  • Analyzing the lengths of intervals by selecting intervals that occupy randomly chosen points in time or space, a process that favors longer intervals. This is known as length bias.

  • Rejection of "bad" data on arbitrary grounds, instead of according to previously stated or generally agreed criteria.
  • Rejection of "outliers" on statistical grounds that fail to take into account important information that could be derived from wild observations as described by Kruskal.

  • Pre-screening of trial participants, or advertising for volunteers within particular groups. For example to "prove" that smoking doesn't affect fitness, advertise for both at the local fitness centre, but advertise for smokers during the advanced aerobics class, and for non-smokers during the weight loss sessions.
  • Discounting trial subjects/tests that did not run to completion. For example, in a test of a dieting program, the researcher may simply reject everyone who drops out of the trial. But most of those who drop out are those for whom it wasn't working.
  • Self-selection bias, which is possible whenever the group of people being studied has any form of control over whether to participate. Participants' decision to participate may be correlated with traits that affect the study, making the participants a non-representative sample. For example, people who have strong opinions or substantial knowledge may be more willing to spend time answering a survey than those who don't.
  • Migration bias may be introduced by excluding subjects who have recently moved into the study area -- this may occur when newcomers are not available in a register used to identify the source population -- or by excluding subjects who move out of the study area during follow-up.

  • Selection of which studies to include in a meta-analysis (see also combinatorial meta analysis)
  • Performing repeated experiments and reporting only the most favourable results. (Perhaps relabelling lab records of other experiments as "calibration tests", "instrumentation errors" or "preliminary surveys".)
  • Presenting the most significant result of a data dredge as if it were a single experiment. (Which is logically the same as the previous item, but curiously is seen as much less dishonest.)

Selection bias is closely related to:

  • sample bias, a selection bias produced by an accidental bias in the sampling technique, as against deliberate or unconscious manipulation.
  • publication bias or reporting bias, the distortion produced in community perception or meta-analyses by not publishing uninteresting (usually negative) results, or results which go against the experimenter's prejudices, a sponsor's interests, or community expectations.
  • confirmation bias, the distortion produced by experiments that are designed to seek confirmatory evidence instead of trying to disprove the hypothesis.

In the general case, selection biases cannot be overcome with statistical analysis of existing data alone, though see the work of James Heckman for some strategies in special cases. The degree of effect of selection bias can be partially measured by examining correlations between (exogenous) background variables and a treatment indicator.

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.