Receiver Operating Characteristics

Receiver Operating Characteristics (ROC) analysis is a useful method for evaluating classification models. ROC provides a means to compare individual models and determine thresholds which yield a high proportion of positive hits.

This help topic discusses:

Why Use ROC?

ROC supports "what-if" analysis. You can use ROC to experiment with modified model settings to observe the effect on the confusion matrix. For example, suppose that a business problem requires that the false-negative value be reduced as much as possible within the confines of a the requirement that the number of positive predictions be less than or equal to some fixed number. For example, you might offer an incentive to each customer predicted to be high-value, but you are constrained by budget to a maximum of 170 incentives. On the other hand, the false negatives represent "missed opportunities," so you want to avoid such mistakes.

Using ROC

To use ROC, move the red line and observe the changes in the confusion matrix. As you change the confusion matrix, you are changing the probability that result in a positive prediction. Normally, the probability assigned to each case is examined and if the probability is .5 or above, a positive prediction is made. Changing the cost matrix changes the positive prediction threshold to some value other than .5, and the changed value is displayed in the first column of the table beneath the graph.

The experiments that you make by moving the red line do not change any values. If you wish to change the positive prediction threshold, return to the activity display and click Select ROC Threshold. Then move the read line to the threshold determined by experimentation, then click OK. The model is now modified, and the modified threshold is displayed in the Test Metrics step of the activity.

Detailed Explanation of ROC

The horizontal axis of an ROC graph measures the false positive rate as a percentage. The vertical axis shows the true positive rate. The top left hand corner is the optimal location in an ROC curve, indicating high true-positive rate versus low false-positive rate. The area under the ROC curve measures the discriminating ability of a binary classification model. The larger the area under the curve, the higher the likelihood that an actual positive case will be assigned a higher probability of being positive than an actual negative case. The area under the curve measure is especially useful for data sets with unbalanced target distribution (one target class dominates the other).

ROC also helps to determine a threshold value to achieve an acceptable trade-off between hit (true positives) rate and false alarm (false positives) rate. The correct trade-off for a problem depends on the particular problem. For example, for some problems, it may be important to minimize the number of false-positive predictions; for other problems, it may be important to minimize the number of false-negative predictions. By selecting a point on the curve for a given model, a given trade-off is achieved. This threshold can then be used as a post-processing parameter for achieving the desired performance with respect to the error rates. Oracle Data Mining models by default use a threshold of 0.5.