ComS 472/572:
Principles of Artificial Intelligence
Department of Computer Science
Iowa State University
Lab 3
Due: Friday, November 18, 2011, 11:00am.
In this assignment, you will experiment with the Naive Bayes classifier.
- Download and install Weka.
Data
- Breast Cancer data file
(already in the Weka format). It has 9 numeric attributes and 2 types of
cancer to be predicted. (refer to
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/breast-cancer-wisconsin/
for the complete description of the data).
- House-votes dataset (the task is to
predict whether the voter is a republican or a democrat based on their votes;
the data description file can be found here).
It has 16 binary attributes and 2 classes. One of your tasks is to convert
this data into Weka readable format.
Tasks
- Estimate the accuracies of Naive Bayes classifier on
House-votes dataset using 5-fold cross validation.
- Estimate the accuracies of Naive Bayes classifier on
Breast cancer dataset using 5-fold cross validation. The dataset has numeric
values. You can use Weka's filter to discretize the data into ten bins. Compare the results with
or without discretizing the data.
- Note that both datasets have missing values (denoted by "?"). Now first
use Weka's missing values filter to fill them in, then redo tasks 1 and 2. How
does filling missing values affect the performance of the classifiers?
What to Turn In
Turn in via email to the TA a compressed file
(.zip .rar or .tar.gz) containing the following:
- A report (in electronic form) of the results obtained with answers to the questions in the
Tasks section.
You should specify the parameters
of every experiment in such a way they can be replicated by the T.A. (e.g. Indicate whether you use the Weka's filter to discretize
the data), and justify your decision about the use (or not) of these
parameters.
- Any source code that you may have written.
- The House-votes data file in arff format.
Note that the electronic items must be submitted by 11:00 a.m. You should receive email
confirmation of your submitted file before 11:15 a.m. If you don't receive such
email, there was a problem with the submission and you should contact
inmediately the TA for help (via email).