Split

The Split step of an activity divides the input table into two subsets:

Classification Build Activities use Stratified Split, a different version of split which splits a data set into a build data set and a test data set while trying to preserve the target distribution.

The default splits depend on the number of cases (records) as follows:

By default, the input table is randomized before it is split. Two new tables are created.

The Total Case Count, that is, the number of records in the input table is displayed.

You can change the following:

Click OK to change the options. Clicking Restore restores the default values for the options.