Laboratory Assignment 1
|
Laboratory Assignment 2
Due February 14, 2007
In this assignment, you will experiment with
the Decision Tree learner
Data
- Breast Cancer data file
(already in the needed format). It has 9 numeric attributes and 2 types of
cancer to be predicted. (refer to
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/breast-cancer-wisconsin/
for the complete description of the data).
- House-votes dataset (the task is to predict whether the voter is a
republican or a democrat based on their votes; the data description file can
be found here). It has 16 binary attributes
and 2 classes. You will need to convert this data into Weka readable
format.
-
Reuters Data - one of the benchmark datasets for text categorization and
natural language processing. It is a collection of articles, that consists of
articles in the top 10 categories. The task is to assign a topic to a new
article.
This dataset has been preprocessed to reduce the vocabulary size to 300 words
using mutual information. The original data can be found
here.
Tasks
- Estimate the accuracy of decision classifier on this data set using 5-fold cross validation on the house-votes-84 data set.
datasets. (Note: both datasets have missing values. You may use Weka's missing values filter to fill
them in or not. How does filling missing values affect the performance of the classifier?
- Estimate the accuracy of the decision tree classifier on this data set using 5-fold cross-validation. Breast cancer dataset has numeric values. (again, you can handle missing values as in the previous case).
- Estimate the precision, recall, accuracy, and F-measure of the decision tree classifier on the text classification task for each of the 10 categories using 10-fold cross-validation.
- Repeat each of the above experiments with pruning option chosen. How do the results with pruning compare with the results without pruning?
What to Turn In
Turn in via the turnin script (see the instructions on
laboratory assignment page):
- A report of the results obtained
with answers to the questions in the Tasks section.
- Any source code that you may have written.
|