Classification Models

Data mining models are based on one of two kinds of learning: supervised and unsupervised. Supervised learning is typically used to predict a value. Classification problems use supervised learning.

In a classification problem, you have a number of cases (examples) and wish to predict which of several classes each case belongs to. Each case has multiple attributes; each attribute takes on one of several possible values. The attributes consist of multiple predictor attributes (independent variables) and one target attribute (dependent variable). Each of the target attribute's possible values is a class to be predicted on the basis of that case's predictor attribute values.

Different classification algorithms use different techniques for finding relations between the predictor attributes' values and the target attribute's values in the data used to build the model. These relations are summarized in the model. This model can then be applied to new cases with unknown target values to predict target values.

The application of a classification model to new data is called applying the model or scoring the data. For basic information about applying models, see Apply a Model.

For more information about classification, see Where to Find More Information.

Algorithms

You can use the Classification Model Build Wizard to create models that use one of the following algorithms:

Note: Not all algorithms support text mining. See Text Mining for more information.

ABN Deprecated

The Adaptive Bayes Network (ABN) algorithm is deprecated in Oracle Data Mining 11g. Oracle Data Miner does not support building ABN models; you can view ABN models created using the Oracle Data Mining 11g APIs. Use the Decision Tree algorithm to generate rules.

Costs

In a classification problem, it may be important to specify the costs involved in making an incorrect decision. Doing so can be useful when the costs of different misclassifications vary significantly.

Costs are specified in a cost matrix. The rows of a cost matrix correspond to actual values; the columns correspond to predicted values. For any pair of actual/predicted values, the entry in the matrix indicates the number of records classified in that pairing.

Priors

When you build a classification model, you may need to balance the number of positive and negative cases for the target of a supervised model. This can happen either because a given target value is rare in the population, for example, fraud cases, or because the data you have does not accurately reflect the real population, that is, the data sample is skewed. You use priors to specify how the build data matches the data in the population.

A classification model works best when it has a reasonable number of examples of each target value in its build data table. When only a few possible values exist, it works best with more or less equal numbers of each value. To work around this problem, you can create a build data table in which positive and negative target values are more or less evenly balanced, and then supply priors information to tell the model what the true balance of target values is.