Outlier Treatment Options

For general information about outliers, see Outliers in Oracle Data Mining.

You can change the definition of "outlier" by changing the number of standard deviations or by entering an explicit cutoff point, either as a percentage of records or as an actual value. You can also choose to discard extreme values rather than to recode them.

To define an outlier treatment, you must supply two pieces of information:

  1. Cutoff Points: Specify the values that are outliers The default specifies that values that are more than 3 Sigma (standard deviations) from the mean are outliers; this means that values < AVG-3*Sigma or values > AVG+3*Sigma are outliers. You can also specify the percent of values in each tail or upper and lower values.
  2. Replace with: Specify how to treat the outliers. You can either replace the values with NULL values (that is, discard outliers) or you can replace then with edge values. Suppose that 10 is the mean of an attribute's distribution and 5 is the standard deviation. Suppose that outliers are values that are less than -5 (the mean minus 3 times the standard deviation) or greater than 25 (the mean plus three times the standard deviation), then you can either replace the value -10 with NULL or replace it with the edge value -5.

Click OK to continue.