EXCLAIM

From Wikipedia, the free encyclopedia

The EXtensible Cross-Linguistic Automatic Information Machine (EXCLAIM) is an integrated tool for cross-language information retrieval (CLIR), created at the University of California, Santa Cruz in early 2006. It is currently in a beta stage of development, with some support for more than a dozen languages. The lead developers are Justin Nuger and Jesse Saba Kirchner.

Early work on CLIR depended on manually constructed parallel corpora for each pair of languages. This method is labor-intensive compared to parallel corpora created automatically. A more efficient way of finding data to train a CLIR system is to use matching pages on the web which are written in different languages[1].

EXCLAIM capitalizes on the idea of latent parallel corpora on the web by automating the alignment of such corpora in various domains. The most significant of these is Wikipedia itself, which includes articles in 250 languages. The role of EXCLAIM is to use semantics and linguistic analytic tools to align the information in these wikipedias so that they can be treated as parallel corpora. EXCLAIM is also extensible to incorporate information from many other sources, such as the Chinese Community Health Resource Center (CCHRC).

One of the main goals of the EXCLAIM project is to provide the kind of computational tools and CLIR tools for minority languages and endangered languages which are often available only for powerful or prosperous majority languages.

EXCLAIM is in a beta state, with varying degrees of functionality for different languages. Support for CLIR using the Wikipedia dataset and the most current version of EXCLAIM (v.0.4), including full UTF-8 support and Porter stemming for the English component, is available for the following nineteen languages:

Amharic
Bengali
Gothic
Greek
Icelandic
Indonesian
Latvian
Malagasy
Nahuatl
Navajo
Quechua
Sardinian
Swahili
Tagalog
Tibetan
Turkish
Welsh
Wolof
Yiddish

Support using the Wikipedia dataset and an earlier version of EXCLAIM (v.0.3) is available for the following languages:

Dutch
Spanish

Current development efforts focus on developing support for Chinese, which has technical issues with segmentation and encoding as well as many available latent datasets in addition to the Wikipedia dataset. Chinese support will be the first for any language in EXCLAIM v.0.5, which incorporates the Trimming And Reformatting Modular System (TARMS) toolkit.

The EXCLAIM development plan calls for an integrated CLIR instrument usable searching from English for information in any of the supported languages, or searching from any of the supported languages for information in English when EXCLAIM 1.0 is released. Future versions will allow searching from any supported language into any other, and searching from and into multiple languages.

Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.