|
|
Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data
Jun Zhang Vasant Honavar
Artificial Intelligence Research Laboratory
Iowa State University, USA
|
Abstract
|
|
We consider the problem of learning to classify partially specified instances i.e.,
instances that are described in terms of attribute values at different levels
of precision, using user-supplied attribute value taxonomies (AVT). We
formalize the problem of learning from AVT and data and present an AVT-guided
decision tree learning algorithm (AVT-DTL) to learn classification rules at
multiple levels of abstraction. The proposed approach generalizes existing
techniques for dealing with missing values to handle instances with partially missing values. We present
experimental results that demonstrate that AVT-DTL is able to effectively learn
robust high accuracy classifiers from partially specified examples. Our
experiments also demonstrate that the use of AVT-DTL outperforms standard
decision tree algorithm (C4.5 and its variants) when applied to data with
missing attribute values; and produces substantially
more compact decision trees than those obtained by standard approach.
|
|
Full Paper:
[pdf][ps]
Presentation Slides:
[ppt]
|
|
AVT-guided Decision Tree Learning Algorithm
in Java
|
|
Source Code Download:
[AVT-DTL 2.0]
|
|
AVTs and Data
|
|
Two data sets, Mushroom Toxicology and
Nursery (both are from UCI Repository) were used in
our performance evaluations in the paper.
Attribute Valute Taxonomies for the two
data sets were submitted to the UCI Repository. You
can download them here:
AVTs Download: [avts]
|
|
In order to explore the algorithm on data sets with
different percentage of totally missing or partially
missing values. We generate partially specified data
based on the original data and AVTs. You can download
the data sets from here:
Data Sets Download:
[data1][data2]
|

Copyright(c) 2003 Jun Zhang. All rights reserved.
|