k-Means Algorithm

The k-Means algorithm is used to solve clustering problems; it is a distance-based clustering algorithm that partitions the data into a predetermined number of clusters. The k-Means algorithm works with both categorical and numerical attributes. Distance-based algorithms rely on a distance metric (function) to measure the similarity ("closeness") between data points.

Each cluster has a centroid (center of gravity); records that are in a cluster are close to the centroid (and each other). When a new cluster is split from an existing one (that is, a new centroid is defined), each record is assigned to the cluster whose centroid is closest to the record. Oracle Data Mining's version of k-Means goes beyond the classical implementation by defining a hierarchical parent-child relationship of clusters.

Algorithms Details

Oracle Data Mining implements an enhanced version of the k-Means algorithm; the Oracle Data Mining implementation has the following features:

You can use k-Means for clustering where the input table has text columns. For details, see Text Mining.

For more information, see the Oracle Data Mining documentation in Where to Find More Information.