Linear classifier

From Wikipedia, the free encyclopedia

In the field of Machine Learning, the goal of classification is to group items, that have similar feature values, into groups. A linear classifier achieves this by making a classification decision based on the value of the linear combination of the features.

Contents

If the input feature vector to the classifier is a real vector \vec x, then the output score is

y = f(\vec{w}\cdot\vec{x}) = f\left(\sum_j w_j x_j\right),

where \vec w is a real vector of weights and f is a function that converts the dot product of the two vectors into the desired output. The weight vector \vec w is learned from a set of labeled training samples. Often f is a simple function that maps all values above a certain threshold to the first class and all other values to the second class. A more complex f might give the probability that an item belongs to a certain class.

For a two-class classification problem, one can visualize the operation of a linear classifier as splitting a high-dimensional input space with a hyperplane: all points on one side of the hyperplane are classified as "yes", while the others are classified as "no".

A linear classifier is often used in situations where the speed of classification is an issue, since it is often the fastest classifier, especially when \vec x is sparse. However, decision trees can be faster. Also, linear classifiers often work very well when the number of dimensions in \vec x is large, as in document classification, where each element in \vec x is typically the number of counts of a word in a document (see document-term matrix). In such cases, the classifier should be well-regularized.

There are two main approaches for determining the parameters of a linear classifier \vec w [1][2]. The first is by modeling conditional density functions P(\vec x|{\rm class}). Examples of such algorithms include:

The second approach is called discriminative training, which attempts to maximize the quality of the output on a training set. Additional terms in the training cost function can easily perform regularization of the final model. Examples of discriminative training of linear classifiers include

  • Logistic regression --- maximum likelihood estimation of \vec w assuming that the observed training set was generated by a binomial model that depends on the output of the classifier.
  • Perceptron --- an algorithm that attempts to fix all errors encountered in the training set
  • Support vector machine --- an algorithm that maximizes the margin between the decision hyperplane and the examples in the training set.

Note: In contrast to its name, LDA does not belong to the class of discriminative models in this taxonomy. However, its name makes sense when we compare LDA to the other main linear dimensionality reduction algorithm: Principal Components Analysis (PCA). LDA is a supervised learning algorithm that utilizes the labels of the data, while PCA is an unsupervised learning algorithm that ignores the labels. To summarize, the name is an historical artifact (see [3], p.117).

Discriminative training often yields higher accuracy than modeling the conditional density functions. However, handling missing data is often easier with conditional density models.

All of the linear classifier algorithms listed above can be converted into non-linear algorithms operating on a different input space \varphi(\vec x), using the kernel trick.

  1. ^ T. Mitchell, Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression. Draft Version, 2005 download
  2. ^ A. Y. Ng and M. I. Jordan. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes. in NIPS 14, 2002. download
  3. ^ R.O. Duda, P.E. Hart, D.G. Stork, "Pattern Classification", Wiley, (2001). ISBN 0-471-05669-3

See also:

  1. Y. Yang, X. Liu, "A re-examination of text categorization", Proc. ACM SIGIR Conference, pp. 42-49, (1999). paper @ citeseer
  2. R. Herbrich, "Learning Kernel Classifiers: Theory and Algorithms," MIT Press, (2001). ISBN 0-262-08306-X
Advanced Search
Included Web Search Engines


Safe Search

close

Top Matching Results

Occasionally Search.com will highlight specialized results that are based on the context of your query. Examples of specialized results include specific links to news, images, or video.

Top Matching Results may highlight information from other Search.com pages, content from the CNET Network of sites, or third party content. The listings are based purely on relevance. Search.com does not receive payment for listings in this section but our partners that provide this data may get paid for listing these products.

Sponsored Links

This section contains paid listings which have been purchased by companies that want to have their sites appear for specific search terms and related content. These listings are administered, sorted and maintained by a third party and are not endorsed by Search.com.

Search Results

Search.com sends your search query to several search engines at one time and integrates the results into one list which has been sorted by relevance using Search.com's proprietary algorithm. You can customize the list of search engines included in your metasearch from the preferences.

The search engines that are used in your metasearch may allow companies to pay to have their Web sites included within the results. To view the Paid Inclusion policy for a specific search engine, please visit their Web site. Search.com does not accept payment or share revenue with any search engine partner for listings in this section.