Attribute Importance

Note: Attribute Importance does not work with data that has attributes that have primarily NULL values, that is, sparse data. You should remove attributes that have mostly NULL values before you build an attribute importance model.

If a data set has many attributes, it is likely that not all attributes will contribute to a predictive model. Indeed, some attributes may simply add noise, that is, they actually detract from the model's predictive value. Oracle Data Mining provides Attribute Importance (AI) that uses the Minimum Description Length algorithm to rank the attributes by significance in determining the target value.

The time required to build Oracle Data Mining classification models increases with the number of attributes. Attribute Importance identifies a proper subset of the attributes that are most relevant to predicting the target. Model building can proceed using the selected attributes only.

Using fewer attributes does not necessarily result in lost predictive accuracy. Using too many attributes (especially those that add noise) can affect the model and degrade its performance and accuracy. Mining using the smallest number of attributes can save significant computing time and may build better models.

Decision Tree and Adaptive Bayes Network algorithms do internal feature reduction, that is, they determine which attributes are important to build the model and use only those attributes. For these kinds of models, it is not necessary to create an AI model to reduce the number of features. Even for these algorithms, reducing the number of features may result in better performance.

An Attribute Importance model calculates rank and importance for each attribute. The rank of an attribute is an integer. Importance is a real number that may be negative. The rank or importance of an attribute allows you to select the attribute to be used in building models. The correct way to interpret attribute importance is that attributes with a greater numeric value for importance are relatively more important; the most important attribute has rank equal to 1. If the importance of attribute A is 10 times bigger than the importance of attribute B, you cannot assume that attribute A is 10 times more important than attribute B; all that you can assume is that attribute A is more important than attribute B. If the importance of an attribute is a negative number, then that attribute is not correlated with the target. See the description of the Minimum Descriptor Length algorithm for more details.

Oracle Data Mining Attribute Importance models use the Minimum Descriptor Length Algorithm.