k-Means Options
The default settings are designed so that they should work
for most cases.
You can change the following settings:
- Number of Clusters: The number of clusters. Must be a positive integer; the default is 10. The k-Means algorithm creates the number of clusters specified by the user, except in the unusual case in which the number of records is less than the number of requested clusters.
- Distance Function specifies how the algorithm calculates distance. The default distance function is Euclidean; the other distance function is Cosine.
- Split Criterion is either Variance (split the cluster that is least homogeneous) or Size (split the largest cluster). The default is Variance.
- Minimum Error Tolerance must be between .001 (slow build) and 0.1 (fast build); the default is .01. Increasing Minimum Error Tolerance builds models faster, but with possibly lower accuracy.
- Maximum Iterations must be between 2 (faster build) and 30 (slower build); the default is 3. This value is the maximum number of iterations for the k-Means algorithm.
- Minimum Support is a number > 0 and <= 1.0. It is the fraction of attribute values that must be not
NULL
in order for the attribute to be included in the rule description for the cluster. Setting the parameter value too high in data with missing values can result in very short or even empty rules.
- Number of Bins is a positive integer; the default value is 10. This value specifies the number of bins in the attribute histogram produced by k-Means. The bin boundaries for each attribute are computed globally on the entire training data set. The binning method is equi-width. All attributes have the same number of bins with the exception of attributes with a single value that have only one bin.
- Block Growth is a number greater than 1 and less than or equal to 5. This value specifies the growth factor for memory allocated to hold cluster data; the default is 2.
Termination of Training
Model training stops after either the change in error between two consecutive iterations is less than Minimum error tolerance or the maximum number of iterations is greater than Maximum iterations.
Click OK to continue.
Copyright © 2006, 2008, Oracle. All rights
reserved.