Clustering models uncover natural groupings (clusters) in the data. Members of the same cluster are more like ("closer to") each other than they are like members of a different cluster.
Clustering can be used to explain the common characteristics of members of a cluster, and also to determine what distinguishes members of one cluster from members of another cluster.
Clustering can be a useful data-preprocessing step to identify homogeneous groups on which to build predictive models.
In Oracle Data Mining a cluster is characterized by its centroid, attribute histograms, and place in the clustering model hierarchical tree. Oracle Data Mining performs hierarchical clustering using one of the following algorithms:
The clusters discovered by these algorithms are used to create rules that capture the main characteristics of the data assigned to each cluster.
The clusters are also used to generate a Bayesian probability model that is used during scoring for assigning data points to clusters.
After you build a clustering model, you can apply it to new data. For a brief overview of the apply process, see Apply a Model.
Clustering can also be used to solve anomaly detection problems. Build a clustering model, apply it, and then find items that do not fit in any cluster.
For more information about clustering, see Where to Find More Information.
Copyright © 2006, 2008, Oracle. All rights reserved.