The Naive Bayes algorithm supports two thresholds: Singleton threshold and Pairwise threshold. Singleton and Pairwise thresholds are used to eliminate rare and possibly noisy cases. Setting the thresholds closer to 0 may result in a more accurate model, but it will take longer to build the model. Setting the thresholds closer to 1 may result in models that build faster, but the models will be less accurate.
The default settings are designed so that they should work for most cases.
The singleton threshold is a threshold on the count of the frequent items. An item is a frequent item if it is included in a sufficiently large number of transactions. The singleton threshold is expressed as a percentage of the number of profiles. Suppose that the total number of transactions that an item appears is k
and the total number of profiles is P
, and t
is the singleton threshold expressed as a percentage of P
(0.01 indicates 1 percent of P
). Then, the item is a frequent item if k >= t*P
.
The pairwise threshold is a threshold on the count of the frequent item pairs. An item pair is a frequent item pair if the items in the pair are frequent items that occur together in sufficiently large number of profiles. Suppose that two distinct items occur together in k
profiles, the total number of profiles that include the frequent items is P
, and t
is the threshold expressed in percentage (0.01 indicates 1 percent of P
). Then the pair is a frequent item pair if k > t*P
.
Click OK to continue.
Copyright © 2006, 2008, Oracle. All rights reserved.