Query expansion

From Wikipedia, the free encyclopedia

Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information retrieval operations.[1] In the context of web search engines, query expansion involves evaluating a user's input (what words were typed into the search query area, and sometimes other types of data) and expanding the search query to match additional documents. Query expansion involves techniques such as:

  • Finding synonyms of words, and searching for the synonyms as well
  • Finding all the various morphological forms of words by stemming each word in the search query
  • Fixing spelling errors and automatically searching for the corrected form or suggesting it in the results
  • Reweighting the terms in the original query

Query expansion is a technology studied in the field of computer science, particularly within the realm of natural language processing and information retrieval.

Contents

Search engines invoke query expansion to increase the quality of user search results. It is assumed that users do not always formulate search queries using the best terms. Best in this case may be due to the fact that the corpus does not contain the user entered terms.

By stemming a user-entered term, more documents are matched, as the alternate word forms for a user entered term are matched as well, increasing the total recall. This comes at the expense of reducing the precision. By expanding a search query to search for the synonyms of a user entered term, the recall is also increased at the expense of precision. This is due to the nature of the equation of how precision is calculated, in that a larger recall implicity causes a decrease in precision, given that factors of recall are part of the denominator. It is also inferred that a larger recall negatively impacts overall search result quality, given that no user wants even more results to comb through, regardless of the precision.

The goal of query expansion in this regard is by increasing recall, precision can potentially increase (rather than decrease as mathematically equated), by including in the result set pages which are more relevant (of higher quality), or at least equally relevant. Pages which would not be included in the result set, which have the potential to be more relevant to the user's desired query, are included, and without query expansion would not have, regardless of relevance. At the same time, many of the current commercial search engines use word frequency (Tf-idf) to assist in ranking. By ranking the occurrences of both the user entered words and synonyms and alternate morphological forms, documents with a higher density (high frequency and close proximity) tend to migrate higher up in the search results, leading to a higher quality of the search results near the top of the results, despite the larger recall.

This tradeoff is one of the defining problems in query expansion, regarding whether it is worthwhile to perform given the questionable effects on precision and recall. Critics state one of the problems is that the dictionaries and thesauri, and the stemming algorithm, are driven by human bias and while this is implicitly handled by the query expansion algorithm, this explicitly affects the results in a non-automated manner (similar to how statisticians can 'lie' with statistics). Other critics point out potential for corporate influence on the dictionaries, promoting advertising of online web pages in the case of web search engines.

  • D. Abberley, D. Kirby, S. Renals, and T. Robinson, The THISL broadcast news retrieval system. In Proc. ESCA ETRW Workshop Acessing Information in Spoken Audio, (Cambridge), pp. 14-19, 1999. Section on Query Expansion - concise, mathematical overview.
  • Y. Qiu and H.P. Frei. Concept Based Query Expansion. In Proceedings of SIGIR-93, 16th ACM International Conference on Research and Development in Information Retrieval, Pittsburgh, SIGIR Forum, ACM Press, June 1993. Available here - academic document on a specific method of query expansion
  • Efthimis N. Efthimiadis. Query Expansion. In: Martha E. Williams (ed.), Annual Review of Information Systems and Technology (ARIST), v31, pp 121-187, 1996. Available here - an introduction for less-technical viewers.

  1. ^ Vectomova, Olga; Wang, Ying (2006). "A study of the effect of term proximity on query expansion" (Abstract). Journal of Information Science 32 (4): 324–333. DOI:10.1177/0165551506065787. Retrieved on 2006-12-09. 
Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.