After a model is built, you test it to measure how well the model predicts the target. A model is tested by applying it to data for which the target value is known (actual values) and then comparing the predicted values with the actual values. Note that the data used for model test must be prepared in the same way that the data used for model build was.
Oracle Data Miner supports testing two types of models:
You can calculate the following test metrics for a classification model:
The default is to calculate all 4 metrics; Predictive Confidence and a Confusion Matrix are always calculated.
These metrics are briefly described in this topic. For more information about testing classification models, see Oracle Data Mining Concepts in Where to Find More Information.
Predictive Confidence is a number between 0 and 1 that indicates how much better the predictions made by the tested model are than predictions made by a naive model. The naive model always predicts the mean for numerical targets and the mode for categorical targets. For more information, see Classification Model Test Metrics: Predictive Confidence.
Lift measures how well the model improves predictions over using a random value.
Lift measures of how "fast" the model finds the actual positive target values. (The origin is in Marketing: "How much of my Customer database must I contact to find 50% of the customers likely to buy Product X?")
To calculate lift, Oracle Data Mining applies the model to test data to gather predicted and actual target values (the same data that is used to calculate the Confusion matrix), sorts the predicted results by probability (that is, Confidence in a positive prediction), divides the ranked list into equal parts (quantiles - the default number is 10), and then counts the Actual positive values in each quantile.
The confusion matrix is calculated by applying the model to a hold-out sample (the test set, created during the split step in a classification build activity) taken from the build data. The values of the target are known; the known values are compared with the values predicted by the model.
The confusion matrix indicates the types of errors that the model is likely to make. The columns are predictions and the rows are actual values. For example, if you are predicting a target with values 0 and 1, the number in the upper right cell of the confusion matrix indicates the false-positive predictions, that is, predictions of 1 when the actual value is 0.
Calculating Receiver Operating Characteristics results in a ROC curve. ROC curves provide a means of comparison between individual models and determine thresholds which yield a high proportion of positive hits. The area under the ROC curve measures the discriminating ability of a binary classification model. For more information about ROC and how to use it, see Receiver Operating Characteristics.
The following test metrics are calculated for a regression model:
These statistics are the metrics most commonly used to test regression models.
You can also create a residual plot to evaluate a regression model. The default is to create a residual plot. See Residual Plot for more information.
Copyright © 2006, 2008, Oracle. All rights reserved.