orngLR

orngLR is an implementation of logistic regression. LR is a statistical technique in applicability and model complexity most similar to Naïve Bayes, yet often better. Its limitation is that it only operates with binary attributes and binary class.

The fundamental classes are two: BasicLogisticLearner and BasicLogisticClassifier. They are called basic because they cannot deal with domains that have more than 2 classes. They are used in the same was as other orange learners, with the exception that there are no shortcuts: you first have to initialize the object, and call it later, passing the training examples along. A learner returns you a classifier, which knows how to classify the test examples.

For example:

>>> from orngLR import *

>>> from orange import *

>>> t = ExampleTable('c:/apps/python/orange/doc/monk1.tab')

>>> c = BasicLogisticLearner()(t)

>>> t[60].getclass(),c(t[60])

(0, 0)

We trained it on a dataset, then created a learner, immediately then trained it with data in t, and stored the resulting classifier in c. Then we tested it with one of the training examples, and it got the result right.

You can also ask the classifier about the probability distribution of classes, by giving an additional parameter orange.GetBoth, like this:

>>> t.domain.classVar.values

<1, 0>

>>> c(t[60], orange.GetBoth)

(0, [0.31475420046432656, 0.68524579953567344])

The classifier thinks that the first class value (1), has 31% chance of appearing, whereas the second (0), has a 69% chance.

Note that with some simple domains, logistic regression comes down to linear discriminants. In such cases, the probability distribution estimates become overconfident: the chosen class is assigned 100% probability.

orngSVM

Support Vector Machines are a recent and popular approach. They are a mixture between non-linear discriminants and nearest neighbors. We can do both regression and classification with them. The base classes are BasicSVMLearner and BasicSVMClassifier. They are called basic because their probability estimates are overconfident, and because they only work well with binary classes.

Consider this:

>>> t = ExampleTable('c:/apps/python/orange/doc/monk1.tab')

>>> c = BasicSVMLearner()(t)

>>> t[555].getclass(),c(t[555],orange.GetBoth)

(1, (0, [0.0, 1.0]))

This was very confident and very wrong. This is why we sometimes want to use orngLR.MarginMetaLearner. When constructing a MarginMetaLearner, you have to pass only the basic learner as a parameter. It then performs 10xCV on the test data, trying to learn and fix the “internal” confidence of the SVM classifier (also the linear discriminant, which sometimes comes out of logistic regression):

>>> cm = MarginMetaLearner(BasicSVMLearner())(t)

>>> t[555].getclass(), cm(t[555], orange.GetBoth)

(1, (1, [0.58498031019830099, 0.41501968980169901]))

This was much better: correct, and not as confident. Internally, MarginMetaLearner is using logistic regression to estimate the probability distribution using the distance to the separating hyperplane. Note that MarginMetaLearner only supports binary classification problems.

You can use BasicSVMLearner to perform regression, too:

>>> r = ExampleTable('c:/apps/python/orange/doc/hhs.tab')

>>> rc = BasicSVMLearner()(r)

>>> r[85].getclass(),rc(r[85])

(92.950, 72.869)

BasicSVMLearner has many parameters, which you can adjust like this:

>>> rc = BasicSVMLearner()

>>> rc.type = 2

>>> cc = rc(table)

We will quickly skim through:

>>> c = BasicSVMLearner()

>>> c.type = 0 # classifier (SVC)

>>> c.type = 1 # nu-classifier (NU_SVC)

>>> c.type = 2 # one-class (OC)

>>> c.type = 3 # regression (e_SVR)

>>> c.type = 4 # nu-regression (NU_SVR)

NU_SVC and NU_SVR are learners with an additional parameter, c.nu. OC is a probability density estimator: we show the learner all the examples, and then ask it to give a measure of how likely it is that the new example is similar to the learned examples.

Kernels describe how we compute distance between two examples. The simplest kind of kernel is linear, and it simply computes the dot product between two examples (x.y). The choice of kernel is dependent on the kind of data we are working with.

>>> c.kernel = 0 # linear kernel: x.y

>>> c.kernel = 1 # polynomial kernel: (g*x.y+c)^d

>>> c.kernel = 2 # RBF (default): e^(-g(x-y).(x-y))

>>> c.kernel = 3 # sigmoid: tanh(g*x.y+c)

You noticed some parameters, g, d, and c. They are c.gamma (1/len(exampletable)), c.degree (3) and c.coef0 (0.0), the default values are in brackets.

The most important parameters are c.C (1.0), used by all SVM types, except NU_SVC, and c.nu, used by NU_SVC, NU_SVR, and OC (0.5). They all affect the desired complexity of the model.

c.C is cost of misclassification in SVC, SVR and NU_SVR. The greater, the more complex the model will be allowed to be. Usually we use cross-validation tuning to determine the optimal value. The cost of misclassification is normally expressed in orders of magnitude (0.01, 0.1, 1.0, 10, 100).

c.nu (0.5) is a measure of how many support vectors there can be. It has to be greater than 0.0 and less or equal to 1.0. It is an upper bound of fraction of training errors, and a lower bound of the fraction of support vectors. The greater it is, the more complex the model can be.

c.p is a measure of tolerance when doing regression. The bigger, the more tolerant we are of mistakes with respect to precision. In NU_SVR, c.nu replaces c.p.

You can investigate the complexity of the model, by considering the model field of the classifier, which stores all the information about the SVM model in a human-readable form:

>>> tc = BasicSVMLearner()

>>> tc.type = 1

>>> c = tc(t)

>>> c.model['total_sv']

378

>>> tc.nu = 0.9

>>> c = tc(t)

>>> c.model['total_sv']

512

>>> tc.nu = 0.2

>>> c = tc(t)

>>> c.model['total_sv']

298

We notice that the number of support vectors is rising with the value of c.nu. So c.nu can also be a subject to tuning with cross-validation.

This implementation of SVM does not support example weighting, but you can assign different weights to classes, with the SVC type of learner. For this use the c.classweights array, and note that the complexity and error measures are affected by the sum of class weights, which should stay constant with respect to the default value of [1,1,1,…]. For example, if you want to assign twice the weight to the first class, do like this:

>>> c.classweights = [1.333, 0.666]

With unbalanced data sets, when one class is less frequent than another, you might improve the classification results if you assign the weights inversely proportional to the class frequency. But beware, this kind of weighting will not work if you use MultiClassLearner.

Because SVM is based on numeric optimization, you can configure numeric precision with c.eps (0.001). SVM is using data cache, c.cache_size (40). If you want to allocate more than the default 40 megabytes, change this value.

orngMultiClass

As described earlier, MarginMetaLearner and BasicLogisticLearner only support binary class problems, whereas BasicSVMLearner’s support for multi-class classification is very limited. To properly work with multi-class problems you should use orngMultiClass.

The core class in orngMultiClass is MultiClassLearner(learner, matrix, probability_estimator). The first parameter is an arbitrary kind of learner that supports class probability estimation (like MarginMetaLearner or BasicLogisticLearner, but not BasicSVMLearner which should be wrapped inside MarginMetaLearner).

The matrix describes how a multi-class problem is cut apart into multiple binary problems. There are several pre-fabricated matrix classes, such as MCMOneAll, MCMOneOne, and MCMOrdinal. For a three-class problem they look like this:

>>> from orngMultiClass import *

>>> MCMOneAll()(3)

[[1, -1, -1], [-1, 1, -1], [-1, -1, 1]]

>>> MCMOneOne()(3)

[[1, -1, 0], [1, 0, -1], [0, 1, -1]]

>>> MCMOrdinal()(3)

[[-1, 1, 1], [-1, -1, 1]]

The matrix is composed of a number of vectors. Each vector corresponds to a separate binary classifier. Inside a vector, -1 means that this class value will be negative, +1 positive, whereas 0 will mean that all the training examples of this class will be ignored.

Once we train the learner and obtain a classifier, the classifier is composed of a number of an ensemble of sub-classifiers. Each of the sub-classifiers outputs a probability distribution, and we have to merge all these estimates in a single consistent probability distribution for all the classes. For this, we use probability estimators. There are two: MCPEZadrozny works by numerically solving a system of equations merging the probability distributions; MCPEFriedman simply weights each class by the number of times it has won. MCPEZadrozny is more reliable, whereas MCPEFriedman also works with classifiers that do not provide reliable class probability estimates (for example with BasicSVMLearner).

Consider and understand the following fully-fledged example:

>>> from orngLR import *

>>> from orngSVM import *

>>> from orange import *

>>> t = ExampleTable('kolki.tab')

>>> tc = BasicSVMLearner()

>>> tc.type = 1

>>> tc.nu = 0.2

>>> ml = MarginMetaLearner(tc,folds=5) # we use 5 folds

>>> finalm = MultiClassLearner(ml, matrix=MCMOneAll, pestimator= MCPEZadrozny)(t)

>>> t[4].getclass(), finalm(t[4], GetBoth)

(slab, (slab, [0.32, 0.32, 0.37]))

Bibliography

For SVM we used this paper and library:

# Chih-Chung Chang and Chih-Jen Lin

# LIBSVM : a library for support vector machines.

# http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.ps.gz

For LR we used this one:

# Miller, A.J. (1992):

# Algorithm AS 274: Least squares routines to supplement

# those of Gentleman. Appl. Statist., vol.41(2), 458-478.

For MultiClass estimation, we used:

# Zadrozny, B.:

# Reducing multiclass to binary by coupling probability estimates