The k-Means algorithm is used to solve clustering problems; it is a distance-based clustering algorithm that partitions the data into a predetermined number of clusters. The k-Means algorithm works with both categorical and numerical attributes. Distance-based algorithms rely on a distance metric (function) to measure the similarity ("closeness") between data points.
Each cluster has a centroid (center of gravity); records that are in a cluster are close to the centroid (and each other). When a new cluster is split from an existing one (that is, a new centroid is defined), each record is assigned to the cluster whose centroid is closest to the record. Oracle Data Mining's version of k-Means goes beyond the classical implementation by defining a hierarchical parent-child relationship of clusters.
Oracle Data Mining implements an enhanced version of the k-Means algorithm; the Oracle Data Mining implementation has the following features:
You can use k-Means for clustering where the input table has text columns. For details, see Text Mining.
For more information, see the Oracle Data Mining documentation in Where to Find More Information.
Copyright © 2006, 2008, Oracle. All rights reserved.