k-Means Settings
The default settings are designed so that they should work
for most cases.
You can change the following settings:
- Distance Function specifies how the algorithm calculates distance. The default distance function is Euclidean; the other distance function is Cosine.
- Split Criterion is either Variance or Size. The default is Variance.
- Minimum Error Tolerance must be between .001 (slow build) and 0.1 (fast build); the default is .005. Increasing Minimum Error Tolerance builds models faster, but with possibly lower accuracy.
- Maximum Iterations must be between 2 (slow build) and 30 (fast build); the default is 6. This value is the maximum number of iterations for the k-Means algorithm.
- Minimum Support is a number > 0 and <= 1.0. This value is used to filter out rule predicates that do not meet the support threshold; setting this value too high can result in very short or even empty rules. The default value for Minimum Support is 0.1. The default value allows you to highlight the more important predicates instead producing a long list of predicates that have very low support.
In extreme cases, for very sparse data, all attribute predicates may be filtered out so that no rule is produced. If no rule is produced, you can lower the support threshold and rebuild the model to make the algorithm produce rules even if the predicate support is very low.
- Number of Bins is a positive integer; the default value is 10. This value specifies the number of bins in the attribute histogram produced by k-Means. The bin boundaries for each attribute are computed globally on the entire training data set. The binning method is equi-width. All attributes have the same number of bins with the exception of attributes with a single value that have only one bin.
- Block Growth is a number greater than 1 and less than or equal to 5. This value specifies the growth factor for memory allocated to hold cluster data; the default is 2.
Model training stops after either the change in error between two consecutive iterations is less than Minimum error tolerance or the maximum number of iterations is greater than Maximum iterations.
For more information, see Oracle Data Mining Concepts in Where
to Find More Information.
Click OK to continue.
Copyright © 2006, 2008, Oracle. All rights
reserved.