Home
 

 

 Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data

 Jun Zhang     Vasant Honavar

Artificial Intelligence Research Laboratory

Iowa State University, USA

 

Abstract

We consider the problem of learning to classify partially specified instances i.e., instances that are described in terms of attribute values at different levels of precision, using user-supplied attribute value taxonomies (AVT). We formalize the problem of learning from AVT and data and present an AVT-guided decision tree learning algorithm (AVT-DTL) to learn classification rules at multiple levels of abstraction. The proposed approach generalizes existing techniques for dealing with missing values to handle instances with partially missing values. We present experimental results that demonstrate that AVT-DTL is able to effectively learn robust high accuracy classifiers from partially specified examples. Our experiments also demonstrate that the use of AVT-DTL outperforms standard decision tree algorithm (C4.5 and its variants) when applied to data with missing attribute values; and produces substantially more compact decision trees than those obtained by standard approach.

 

Full Paper: [pdf][ps]

Presentation Slides: [ppt]

 

 

AVT-guided Decision Tree Learning Algorithm in Java

 

Source Code Download: [AVT-DTL 2.0]

 

 

AVTs and Data

Two data sets,  Mushroom Toxicology and Nursery (both are from UCI Repository) were used in our performance evaluations in the paper.

Attribute Valute Taxonomies for  the  two data sets were submitted to the UCI Repository. You can download them here:

AVTs Download: [avts]

 

In order to explore the algorithm on data sets with different percentage of totally missing or partially missing values. We generate partially specified data based on the original data and AVTs. You can download the data sets from here:

Data Sets Download: [data1][data2]

 

 

 

 

 

Copyright(c) 2003 Jun Zhang. All rights reserved.