Split
The Split step of an activity divides the input table into two subsets:
- The Build Table, the table used to build the model
- The Test Table, the table used to test the model
Classification Build Activities use Stratified Split, a different version of split which splits a data set into a build data set and a test data set while trying to preserve the target distribution.
The default splits depend on the number of cases (records) as follows:
- If there are fewer than 10,000 cases in the input table, 60% of cases are in the build table, and 40% are in the test table.
- If there are 10,000 cases or more in the input table, 50% of cases are in the build table, and 50% are in the test table.
By default, the input table is randomized before it is split. Two new tables are created.
The Total Case Count, that is, the number of records in the input table is displayed.
You can change the following:
- Create As: The default is to create two new tables; you can create views.
- Build Table: The percent of cases in the build table; the default is 60 if there are fewer than 10,000 cases, and 50 otherwise.
- Test Table: The percent of cases in the test table; the default is 40 if there are fewer than 10,000 cases, and 50 otherwise.
- Randomize Before Split: The default is Yes, randomize the records in the input table before the table is split. If you have already applied the sample or stratified sample transformation to the table or have otherwise randomized the table, you should specify No.
Click OK to change the options. Clicking Restore restores the default values for the options.
Copyright © 2006, 2008, Oracle. All rights
reserved.