Redistribute Cases

Use this dialog to specify the characteristics of the sample by specifying the number of cases in each subclass.

Total Number of Cases is the total number of cases in the original table or view before you do any sampling.

When you first view this page, the grid displays the values that the attribute for stratification takes on and how the values are distributed in the input table before any sampling takes place. For example, the attribute SEX might take on the values "Male" and "Female", with 5% of the values "Female".

To redistribute the number of cases, click in the Sample Count column for the attribute value and type in a new integer value. You can specify any value less than or equal to the number of cases with the attribute value in the original table. Change the Sample Count for the other attribute values as desired. Hit Enter after you have made all changes. The changed percentages and sample size are displayed. Suppose that the stratification attribute is SEX, and that it takes on two values, "Female" and "Male". Suppose that there are 2000 cases in the original table and that 5% of the cases in the original table have SEX="Female", that is, 100 cases have SEX="Female" and the remaining 1900 cases have SEX="Male". You can set Sample Count for the value "Female" to any value less than or equal to 100.

If you want the sample to have the same number of cases for each value of the attribute, click Equal Distribution. For example, click Equal Distribution if you want 50% of the cases in the sample to have SEX="Female" and 50% of the cases in the sample to have SEX="Male". The grid is updated with new values. The new values in the grid are calculated as follows: 100 cases in the original table have SEX="Female". If you specified a sample size of 200, you specified a sample size that is 10% of size of the original table. Then 10% of 100 cases, that is, 10 cases, in the generated sample will have SEX="Female". Since the values of the attribute SEX are equally distributed in the sample, the number of cases with SEX="Male" will be also be 10.

Note: The calculation described above for Equal Distribution is the calculation that takes place if you run the Sample step in a mining activity.

You can redistribute the number of cases after you select equal distribution; you may wish to do this if the calculated sample size is not the one you wish for your sample. You can change any sample count for the each value of the attribute as long as you do not specify more values than actually exist in the original set. In this example, the count of attributes with SEX="Female" can be any number less than or equal to 100. To change the value, click in the Sample Count column for the attribute value and type in a new value such as 50. If you want to have equal distribution, change the Sample Count for the other attribute values to be the same number. Hit Enter after you have made all changes. The changed percentages and sample size are displayed.

Click Restore to return to the original distribution.

Total Sample Count is the number of cases in the sample. It is always less than or equal to the Total Number of Cases.

Percentage of Original is the percent of cases in the sample; it is always less than or equal to 100%.

The wizard uses a random seed to generate the sample. To view or change the random seed, click Advanced.

Click Next to continue.

The wizard checks that the percentages in the Sample Distribution column add up to 100. If there is a problem, you cannot go to the next step until you correct the problem.