ComS 472/572: Principles of Artificial Intelligence
Department of Computer Science
Iowa State University
 


Lab 3

Due: Friday, November 18, 2011, 11:00am.

In this assignment, you will experiment with the Naive Bayes classifier.

  1. Download and install Weka.

Data

  1. Breast Cancer data file (already in the Weka format). It has 9 numeric attributes and 2 types of cancer to be predicted. (refer to ftp://ftp.ics.uci.edu/pub/machine-learning-databases/breast-cancer-wisconsin/ for the complete description of the data).
  2. House-votes dataset (the task is to predict whether the voter is a republican or a democrat based on their votes; the data description file can be found here). It has 16 binary attributes and 2 classes. One of your tasks is to convert this data into Weka readable format.

Tasks

  1. Estimate the accuracies of Naive Bayes classifier on House-votes dataset using 5-fold cross validation.
  2. Estimate the accuracies of Naive Bayes classifier on Breast cancer dataset using 5-fold cross validation. The dataset has numeric values. You can use Weka's filter to discretize the data into ten bins. Compare the results with or without discretizing the data.
  3. Note that both datasets have missing values (denoted by "?"). Now first use Weka's missing values filter to fill them in, then redo tasks 1 and 2. How does filling missing values affect the performance of the classifiers?

 

What to Turn In

Turn in via email to the TA a compressed file (.zip .rar or .tar.gz) containing the following:

Note that the electronic items must be submitted by 11:00 a.m. You should receive email confirmation of your submitted file before 11:15 a.m. If you don't receive such email, there was a problem with the submission and you should contact inmediately the TA for help (via email).