The Minimum Description Length (MDL) algorithm is used to build Attribute Importance models. MDL is an information theoretic model selection principle. MDL assumes that the simplest, most compact representation of data is the best and most probable explanation of the data.
MDL ranks attributes by considering each attribute as a simple predictive model of the target class. These single predictor models are compared and ranked with respect to the MDL metric. MDL penalizes model complexity to avoid over-fit. It is a principled approach that takes into account the cardinality of the predictors to make the comparisons fair. MDL calculates importance for each attribute and then ranks the attributes. The attribute with the largest value for importance is the attribute with rank equal to 1.
MDL can assign negative importance to an attribute. Negative importance indicates that the attribute is not correlated with the target. However, the MDL algorithm measures univariate correlation to the target, that is, the attribute is considered to be a one-predictor model of the target. It is still possible that the attribute might be useful in an interaction model, such as a decision tree. When you reduce the number of attributes for model build, an attribute with a negative importance is a prime candidate for exclusion. Models such as decision tree and ABN do their own internal attribute reduction, so other than for performance reasons (to make model build and apply faster), there is no need to exclude an attribute with negative rank with those algorithms.
For more information about the MDL algorithm, see Oracle Data Mining Concepts in Where to Find More Information.
Copyright © 2006, 2008, Oracle. All rights reserved.