" />

Iowa State University

Iowa State UniversityIowa State University
Machine Learning: Laboratory Assignment 2

Department of Computer Science

Laboratory Assignment 1

Laboratory Assignment 2

Due March 2, 2007

In this assignment, you will experiment with Logistic Regression and Compare it with the Naive Bayes Classifier

    Data

    1. Breast Cancer data file (already in the needed format). It has 9 numeric attributes and 2 types of cancer to be predicted. (refer to ftp://ftp.ics.uci.edu/pub/machine-learning-databases/breast-cancer-wisconsin/ for the complete description of the data).
    2. House-votes dataset (the task is to predict whether the voter is a republican or a democrat based on their votes; the data description file can be found here). It has 16 binary attributes and 2 classes. You will need to convert this data into Weka readable format.
    3. Reuters Data - one of the benchmark datasets for text categorization and natural language processing. It is a collection of articles, that consists of articles in the top 10 categories. The task is to assign a topic to a new article.

      This dataset has been preprocessed to reduce the vocabulary size to 300 words using mutual information. The original data can be found here.

    Tasks

    1. Estimate the accuracy of Naive Bayes and Logistic regression classifiers on using 5-fold cross validation on the house-votes-84 data set. datasets. (Note: both datasets have missing values. You may use Weka's missing values filter to fill them in or not. How does filling missing values affect the performance of the classifier?
    2. Estimate the accuracy of the Naive Bayes and Logistic Regression Classifier on the breast cancer data set 5-fold cross-validation. Breast cancer dataset has numeric values. (again, you can handle missing values as in the previous case).
    3. Estimate the precision, recall, accuracy, and F-measure of the Naive Bayes and Logistic Regression classifiers on the text classification task for each of the 10 categories using 10-fold cross-validation.
    4. Repeat each of the above experiments using regularized logistic regression. How do the results with regularization compare with the results without regularization?

     

    What to Turn In

    Turn in via the turnin script (see the instructions on laboratory assignment page):

    • A report of the results obtained with answers to the questions in the Tasks section.
    • Any source code that you may have written.