Lab 2: Implementing a Decision Tree Based Learning Agent in Java
http://www.cs.iastate.edu/~cs572/labs/lab2/lab2.html
Out: Nov 11, 2002
Due: Dec 2, 2002
ComS 572 Principles of Aritifical Intelligence
Dimitris Margaritis
Department of Computer Science
Iowa
State University
TA, Jie Bao
Dept of Computer Science
Iowa State University
baojie@cs.iastate.edu
http://www.cs.iastate.edu/~baojie
Nov
11, 2002
is
a Decision
!
1. Problem restatement
You are required to implement the decision tree algorithm and test it on THREE datasets.
Be careful to following things:
More details will be discussed in next help session.
2. About the dataset
UC-Irvine archive of machine learning datasets is a famous Repository for Maching Learning.
One dataset is composed by two files: name file & data file. Name file gives domains for Attributes, and data file actually is a table with Attributes as column and examples as rows. (you can open the .data file with Excel and view it as table)
For example, the Voting dataset can be understood as:
|
handicapped -infants: y,n |
water- |
adoption- |
physician- |
el- |
religious- |
Other Attributes |
ClassName |
|
n |
y |
n |
y |
y |
y |
. |
republican |
|
n |
y |
n |
y |
y |
y |
. |
republican |
|
? |
y |
y |
? |
y |
y |
. |
democrat |
|
n |
y |
y |
n |
? |
y |
. |
democrat |
|
y |
y |
y |
n |
y |
y |
. |
democrat |
|
n |
y |
y |
n |
y |
y |
. |
democrat |
|
n |
y |
n |
y |
y |
y |
. |
democrat |
|
n |
y |
n |
y |
y |
y |
. |
republican |
|
n |
y |
n |
y |
y |
y |
. |
republican |
|
y |
y |
y |
n |
n |
n |
. |
democrat |
|
n |
y |
n |
y |
y |
n |
. |
republican |
|
n |
y |
n |
y |
y |
y |
. |
republican |
|
n |
y |
y |
n |
n |
n |
. |
democrat |
|
y |
y |
y |
n |
n |
y |
. |
democrat |
|
n |
y |
n |
y |
y |
y |
. |
republican |
|
. |
. |
. |
. |
. |
. |
. |
. |
The three datasets:
3. What to turn in
How to turn in: Suppose your files are stored in subdirectory lab2, WITHIN lab2 execute the
following: /home/course/cs572/public/Utilities/turin lab2
Note: If you turnin
in your root directory, all of your files - even your emails - will be submitted!
Since many students feel they weren't clear about the grading
policy and where did they lose points on lab1, this time a HARDCOPY of your
lab report is required for TA to write comments. And it's better to write
all your code in C5.java with input parameters:
C5 <namesFile> <trainsetFilename> <testsetFilename>
It will greatly simplify TA's work.
Grading policy is :
Code:60%
- A readme file for how to compile and run the
program: 5%
- The code to randomly split the datasets: 5%
- Can compile and run
with no errors or exceptions: 25%
- Gives results on all datasets:15%
(5% for each datasets)
- Clear comments in source code: 10%
Report:
40%
- A list of your classes and functions with short description: 5%
- Printed decision trees:
5%
- Do the induced decision trees make sense given what you know about the domains
we used? 5%
- Are you able to obtain any interesting insights about these two test
domains by looking at the induced trees? 5%
- Do you see any places where
pruning might have been useful had we implemented it? Please explain
pruning first. 10%
- How do the
learned trees compare to the very simple learning algorithm that simply
determines the majority class in the training set, and then always guesses that
class on the test set? (For this last question, compute the accuracy of this
simple algorithm and compare it to the accuracy of C5.0): 10%
Example
output 1(part):
|
adoption-of-the-budget-resolution=y |
Example output 2
|
physician-fee-freeze
= y Number of Leaves : 5 |
4. Related Resources
Sample decision tree programs (free and with soucrce code)
|
|
A decision trees in Java by Frans Coenen:
http://www.csc.liv.ac.uk/~frans/COMP101/AdditionalStuff/javaDecTree.html
it's a good beginning for implementing decision tree |
| Weak:Machine Learning Software in Java |
|
|
|
MLC++: Machine Learning for C++ library:http://www.kddresearch.org/Groups/Machine-Learning/Docs/overview-summary.html |
|
YALE: Yet Another Learning Environment(in
Java):
http://yale.cs.uni-dortmund.de/javadoc/edu/udo/cs/yale/operator/learner/package-summary.html
|
|
|
|
C4.5, (in C) the "classic" decision-tree tool, developed by J. R. Quinlan, (restricted distribution) |
|
|
Decision Tree Java Applet by Pierre Geurts: http://www.montefiore.ulg.ac.be/~geurts/dtapplet/dtexplication.html |
|
|
See5(Windows 98/Me/2000/XP) and C5.0(Unix)
from RULEQUEST:
http://www.rulequest.com/see5-info.html
See5 & C5 are sophisticated data mining tools for discovering patterns that delineate categories, assembling them into classifiers, and using them to make predictions. |
|
|
The OC1 (in C) by Steven Salzberg :http://www.cs.jhu.edu/~salzberg/announce-oc1.html
OC1 (Oblique Classifier 1) is a decision tree system continuous feature values; builds decision trees with linear combinations of attributes at each internal node; these trees then partition the space of examples with both oblique and axis-parallel hyperplanes |
|
|
Classification Tree in Excel, from Angshuman Saha |
|
|
YaDT: Yet another Decision Tree builder.
& Efficient C4.5: http://www.di.unipi.it/~ruggieri/software.html by Salvatore Ruggieri YaDT is a new from-scratch implementation of the entropy-based tree construction algorithm; EC4.5 is a more efficient version of c4.5, which uses the best among three strategies at each node construction. |
|
|
IND, provides Gini and C4.5 style decision trees and more. Publicly available from NASA but with export restrictions. |
|
|
LMDT, builds Linear Machine Decision Trees (based on Brodley and Utgoff papers). |
|
|
ODBCMINE, analyzes ODBC databases using C4.5, and outputs simple IF..ELSE decision rules in ascii. |
|
|
PC4.5, a parallel version of C4.5 built with Persistent Linda (PLinda) system. |
You can read them and find inspiration, but NEVER copy them! Write every line by yourself.
[Return to Jie Bao's Homepage]