The Decision Tree (DT) algorithm provides a fast, scalable non-parametric means of extracting predictive information from a database with respect to a user-supplied target. Decision trees extract predictive information in the form of human-understandable rules. The rules are in the form "IF predictive information THEN target", as in "IF income is greater than $70,000 and household size is greater than 3 THEN the probability of Churn is 0.075."
Decision tree rules provide model transparency so that a business user, marketing user, or business analyst can understand the basis of the model's predictions.
In addition to transparency, decision trees provide speed and scalability. The build algorithm scales linearly with the number of predictor attributes. Scoring is very fast. Both build and apply can be parallelized.
Decision Tree algorithm builds predictive models for binary and multi-class targets. The Decision Tree algorithm has reasonable defaults for splitting and termination criteria, performs automatic pruning, and performs automatic handling of missing values. Decision Tree automatically selects the important attributes and uses them to build the model.
You can specify costs and priors.
The Decision Tree algorithm creates branches by creating splits at nodes. DT performs internal optimization to decide which attributes to use at each branching split. At each split, a homogeneity metric is used to determine the attribute values that ensure that the cases satisfying each splitting criterion are predominantly of one target value. For example, it might be determined that most customers over the age of 35 are high-value customers, while those below 35 are low-value customers. There are two homogeneity metrics: Gini and Entropy. Gini is the default. The building of the tree by creating branches continues until one of several user-defined stopping rules is met. A node is said to contain N
records if N
cases of the source data satisfy the branching rules to that point.
Using the default values for the algorithm, the branching stops if the branching has created 7 levels of branches in the tree. A node is not split further if one of the following conditions is true:
A split is rolled back if it produces a node with one of the following characteristics:
Copyright © 2006, 2008, Oracle. All rights reserved.