ComS 573: Machine Learning
Department of Computer Science
Iowa State University
Spring 2012
STUDY GUIDE
The materials to be covered each week and the assigned readings along with
lecture slides are included on this page. The assigned
readings are divided into required and recommended readings. You will be
responsible for the materials covered in the lectures and the assigned required
readings. You are strongly encouraged to explore the recommended readings.
Survival Tips
- Keep up with the assigned readings.
- If you don't understand something in class, or the
assigned problems/labs/readings, ask questions!
- Do not postpone working on assignments.
- Before you come to class, review the materials from the
previous lecture and come prepared to ask any questions that you might
have.
Week 1 (starting
January 9, 2012)
Overview of the course
Review of probability theory,
Information theory.
Bayesian Decision Theory
Required Readings
Assignments
Recommended Readings
- Chapter 2 from: Pattern Classification (2nd ed., 2001) by
Richard O. Duda, Peter E. Hart and David G. Stork
Recommended
Readings for those unfamiliar with probability theory
Recommended
Java Readings for those unfamiliar with Java.
Week 2 (starting January 16, 2012)
Bayesian Decision Theory
Probability parameter estimation problem. Maximum-likelihood estimation
Required Readings
Assignments
Recommended Readings
- Chapter 2, 3 from: Pattern Classification (2nd ed., 2001) by
Richard O. Duda, Peter E. Hart and David G. Stork
Week 3 (starting January 23, 2012)
Bayesian parameter estimation, parameter estimation for discrete
variables.
Naive Bayes Classifier, classifying text documents.
Required Readings
Assignments
Recommended Readings
- Chapter 3 from: Pattern Classification (2nd ed., 2001) by
Richard O. Duda, Peter E. Hart and David G. Stork.
- Chapters 6 from: Mitchell, T. 1997. Machine Learning. New
York: McGraw Hill.
- P. Domingos and M. Pazzani.
On the optimality of the
simple Bayesian classifier under zero-one loss. Machine Learning,
29:103--130, 1997.
- Rish, I. etal. An analysis of
data characteristics that affect naive Bayes performance, In: Proc.
ICML 2001.
- D. D. Lewis. Naive Bayes at
forty: The independence assumption in information retrieval. In
ECML-98: Proceedings of the Tenth European Conference on Machine
Learning, pages 4--15, Chemnitz, Germany, April 1998. Springer.
- Zhang, J., Kang, D-K., Silvesu, A. and Honavar, V. (2006). Extended
version of
Learning
Compact and Accurate Naive Bayes Classifiers from Attribute Value
Taxonomies and Data Journal of Knowledge and Information Systems.
- McCallum, A. and Nigam, K.
A Comparison of Event Models
for Naive Bayes Text Classification.. In AAAI/ICML-98 Workshop on
Learning for Text Categorization, pp. 41-48. Technical Report WS-98-05.
AAAI Press. 1998.
- Jason D. M. Rennie, Lawrence Shih, Jaime Teevan and David R. Karger
Tackling the Poor Assumptions of
Naive Bayes Text Classifiers Proceedings of the Twentieth
International Conference on Machine Learning. 2003.
- Kang, D-K., Silvescu, A. and Honavar, V. (2006).
RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification.
In: Proceedings of the Tenth Pacific-Asia Conference on Knowledge
Discovery and Data Mining (PAKDD 2006). Lecture Notes in Computer
Science.. Berlin: Springer-Verlag.
- Yang, Y. and G. I. Webb (2003).
On Why Discretization Works for
Naive-Bayes Classifiers. In Proceedings of the 16th Australian
Conference on AI (AI 03)Lecture Notes AI 2903, pages 440-452. Berlin:
Springer-Verlag.
- George H. John, Pat Langley,
Estimating Continuous Distributions in Bayesian Classifiers
Proceedings of the Conference on Uncertainty in Artificial Intelligence
1995.
- Susana Eyheramendy, David Lewis, David Madigan
On the Naive Bayes Model for Text Categorization. In:
Proceedings of the Ninth International Workshop on Artificial
Intelligence and Statistics., Bishop, C.M. and Frey, B. (Ed). 2003.
Week 4 (starting January 30, 2012)
Evaluation of classifiers. Performance measures, ROC curves.
Estimation of performance measures, cross-validation.
Hypothesis testing; comparing two learning algorithms.
Required Readings
Assignments
Recommended Readings
- Chapters 5 from: Mitchell, T. 1997. Machine Learning. New York:
McGraw Hill.
- Baldi, P., Brunak, S., Chauvin, Y. and Nielsen, H. (2000)
Assessing the
accuracy of prediction algorithms for classification: an overview.
Bioinformatics Vol. 16. pp. 412-424.
- F Provost, T Fawcett, R Kohavi (1998).
A case against accuracy estimation
of machine learning algorithms. In: proceedings of the fifteenth
International Conference on Machine Learning.
- Hand, D. (2009), Measuring classifier
performance: a coherent alternative to the area under the ROC curve.
Machine Learning, Vol. 77, pp. 103-123.
- Stark, Philip B.
Statistics Tools for Internet and Classroom Instruction with a Graphical
User Interface, Chapters 25-27.
- Dietterich, T. (1998). Approximate
Statistical Tests for Comparing Supervised Classification Algorithms
Neural Computation. 10(7):1895-1923.
- Salzberg, S. (1997). On Comparing
Classifiers: Pitfalls to Avoid and a Recommended Approach Data Mining
and Knowledge Discovery, Vol. 1, pp. 317-328.
Week 5 (starting February 6, 2012)
Decision tree learning algorithm, overfitting, missing data.
Linear models for classification: Linear Discriminant Functions, Perceptrons,
Perceptron Learning algorithm, Multi-category classification.
Required Readings
Recommended Readings
- Chapters 3 from: Mitchell, T. 1997. Machine Learning. New York:
McGraw Hill. Or Chapters 18.3, Artificial Intelligence: A Modern Approach,
Russell and Norvig
- Fayyad, U. and Irani, K.B. (1992). On the handling of
continuous valued attributes in decision tree generation. Machine
Learning vol. 8. pp. 87-102.
- Domingos, P. (1999). The Role of Occam's
Razor in Knowledge Discovery. Data Mining and Knowledge Discovery, Vol.
3, no. 4., pp. 409-425.
- Quinlan, R.. Induction of Decision
Trees, Machine Learning, Vol. 1, pp. 81-106, 1986.
- Quinlan, R.. C4.5: Programs for Machine Learning. Morgan Kaufmann
Publishers, Inc., 1993
- Chapter 5 from: Pattern Classification (2nd ed., 2001) by Richard
O. Duda, Peter E. Hart and David G. Stork.
- Elizondo, D.,
The linear separability problem:
some testing methods. IEEE Transactions on Neural Networks, Volume: 17,
Issue: 2, pages: 330- 344.