The Stratified Sample Transformation wizard creates a stratified sample of a table or view; the sample can be either a table or a view. You can specify the sample size either as a number or percent of records. You can also specify a random seed.
Note: It is not always necessary to create a stratified sample. If you specify Maximum Average Accuracy for a classification or regression model, it may not be necessary to stratify the build data set. For information about accuracy, see Accuracy Type.
In a stratified sample, the population is divided into separate groups according to values of an attribute; each group is randomly sampled separately. For example, you might have an attribute SEX with two values "Male" and "Female"; a stratified sample would be a random sample of all cases where SEX="Male" combined with a random sample of all cases where SEX="Female".
Stratified sampling is useful when the values of the attributes are skewed. For example, suppose that 99% of the cases in the example of the previous paragraph have SEX="Female". If you build a model using SEX as target, the model will always predict SEX="Female". If you stratify the build data set, the model will discover situations the prediction is SEX="Male".
You could use stratified sampling to create a new data set where the number of cases with SEX="Female" is approximately the same as the number of cases with SEX="Male".
Note: You may stratify the Build data set for an activity; the Test data set should not be stratified.
For more information, see How to Use Stratified Sampling.
The output of the wizard is described in Output.
If you invoke this wizard from an activity, the activity has already set values in the wizard. Some values in the wizard, such as the attribute for stratification, cannot be changed. Other values, such as sample size and sample count can be changed.
After the wizard creates the new table or view, it displays the new table or view.
If you do not want to see this page the next time that you launch the wizard, check the Skip this Page Next Time box.
Click Next to continue.
Copyright © 2006, 2008, Oracle. All rights reserved.