The settings for the Support Vector Machine (SVM) algorithm depends on the kernel that you select. This topic describes settings for SVM classification and One-Class SVM, the algorithm used for Anomaly Detection.
The default settings are designed so that they should work well for most cases.
Click Restore to restore the default values.
When you are done specifying algorithm settings, click OK to continue.
SVM supports two kernel functions: Linear and Gaussian. You can pick one of the kernel functions or you can let the system determine the kernel function. The default is to let the system determine the kernel function.
For more information, see SVM Kernel Functions.
If you specify the Linear Kernel or if you let the system determine the kernel, you can change the following settings:
If you specify the Gaussian kernel, you can change the following settings:
SVM models grow as the size of the training data set increases. This property limits SVM models to small and medium size build data sets (less than 100,000 cases). Active learning provides a way to deal with large build data sets.
Active Learning is a methodology optimizes the selection of a subset of the support vectors that maintain accuracy while enhancing the speed of the model.
Active learning increases performance for a linear kernel. Active learning both increases performance and reduces the size of the Gaussian kernel; this is an important consideration if memory and temporary disk space are issues.
Active learning forces the SVM algorithm to restrict learning to the most informative examples and not to attempt to use the entire body of data. In most cases, the resulting models have predictive accuracy comparable to that of the standard (exact) SVM model.
In most cases, you should not disable this setting.
Active learning is on by default. It can be turned off by answering No to the Do you Want Active Learning?.
If you select the Gaussian kernel, you can specify the size for the cache used for storing computed kernels during the build operation. The default size is 50 megabytes.
The most expensive operation in building a Gaussian SVM model is the computation of kernels. The general approach taken to build is to converge within a chunk of data at a time, then to test for violators outside of the chunk. Build is complete when there are no more violators within tolerance. The size of the chunk is chosen such that the associated kernels can be maintained in memory in a "Kernel Cache". The larger the chunk size, the better the chunk represents the population of training data and the fewer number of times new chunks will need to be created. Generally, larger caches imply faster builds.
A model is said to be overfit or overtrained if it works well on the build data, but is not general enough to deal with new data. The complexity factor prevents overfitting by finding the best tradeoff between simplicity and complexity. The algorithm will calculate and optimize this value if you do not specify a value. If the model skews its predictions in favor of one class, you may choose to rebuild with a manually-entered complexity factor higher than the one calculated by the algorithm.
You can specify complexity factor for classification and regression models. Click the radio button next to Yes in answer to the question Do you want to specify the complexity factors?
The complexity factor determines the trade-off between minimizing model error on the training data and minimizing model complexity. Its responsibility is to avoid over-fit (an over-complex model fitting noise in the training data) and under-fit (a model that is too simple).
A very large value of the complexity factor places an extreme penalty on errors, forcing SVM to seek a perfect separation of target classes. A small value for the complexity factor places a low penalty on errors and high constraints on the model parameters, which can lead to under-fit.
The default is to specify no complexity factor, in which case the system calculates a complexity factor. If you do specify a complexity factor, specify a positive number.
Outlier Rate is the approximate rate of outliers (negative predictions) produced by a one-class SVM model on the training data. Outlier Rate is a number > 0 and <= 1; the default value is 0.05. This rate indicates the percent of suspicious records.
If you select the Gaussian kernel, you can specify the standard deviation of the Gaussian kernel. This value must be a positive number. The default is to not specify the standard deviation.
Tolerance is a stopping mechanism, tolerance indicates when the algorithm should be satisfied with the result and consider the building process complete. The default is .001; a higher value will give a faster build but perhaps a less accurate model.
Technically, Tolerance value is the maximum size of a violation of convergence criteria such that the model is considered to have converged. The default value is 0.001. Larger values result in faster building but less accurate models. Tolerance is positive and <= 0.1.
Copyright © 2006, 2008, Oracle. All rights reserved.