Calculating Statistics

Oracle Data Miner calculates all statistics using a sample of the data. Data Miner does this to increase performance. Even simple statistics, such as the mean of a numerical attribute, will be the mean of the sample, not the "real" mean that is calculated by adding all values of the attribute. There are several consequences of this:

It is important to have a sample size that is neither too small or too large. A very large sample size may result in very slow calculations. The default value for sample size is 1000 rows. If you have large data sets (more than 10,000 rows), you should increase this number.

To view or change the sample size, select Tools | Preferences, and click the Sampling tab.