ComS 573: Machine Learning
Department of Computer Science
Iowa State
University
Spring 2008
STUDY GUIDE
The material to be covered each
week and the assigned readings (along with online lecture notes, if available)
are included on this page. The links to lecture notes will not be in place
usually until a week after the lecture. The assigned readings are divided into
required and recommended readings. You
will be responsible for the material covered in the lectures and the assigned
required readings. You are strongly encouraged to explore the recommended
readings.
Survival Tips
- Keep up with the assigned readings.
- If you don't understand something in class, or the assigned
problems/labs/readings, ask questions!
- Do not postpone working on assignments.
- Before you come to class, review the materials from the previous lecture
and come prepared to ask any questions that you might have.
Week 1 (starting January 14, 2008)
Overview of the course
Review of probability theory, Information theory.
Required Readings
Assignments
Recommended Readings for those unfamiliar with
probability theory
| Recommended Java Readings for those
unfamiliar with Java.
|
|
Week 2 (starting January 21, 2008)
Review of Information theory (continued).
Bayesian Decision Theory.
Required Readings
Recommended Readings
- Chapter 2 from: Pattern Classification (2nd ed., 2001) by
Richard O. Duda, Peter E. Hart and David G. Stork
Week 3 (starting January 28, 2008)
Probability density estimation: parameter estimation problem.
Maximum-likelihood parameter estimation, Bayesian parameter estimation,
parameter estimation for normal density, parameter estimation for
discrete variables.
Naive Bayes Classifier: classifying text documents.
Required Readings
Assignments
Recommended Readings
- Chapter 3 from: Pattern Classification (2nd ed., 2001) by
Richard O. Duda, Peter E. Hart and David G. Stork.
- Chapters 6 from: Mitchell, T. 1997. Machine Learning. New
York: McGraw Hill.
- P. Domingos and M. Pazzani.
On the optimality of the simple Bayesian
classifier under zero-one loss. Machine
Learning, 29:103--130, 1997.
- Rish, I.
An Empirical Study of Naive Bayes Classifier,
In: Proc. ICML 2001.
- D. D. Lewis.
Naive Bayes at forty: The independence
assumption in information retrieval. In
ECML-98: Proceedings of the Tenth European
Conference on Machine Learning, pages 4--15,
Chemnitz, Germany, April 1998. Springer.
- Zhang, J., Kang, D-K., Silvesu, A. and
Honavar, V. (2006). Extended version of
Learning Compact and Accurate Naive Bayes
Classifiers from Attribute Value Taxonomies
and Data Journal of Knowledge and
Information Systems.
- McCallum, A. and Nigam, K.
A Comparison of Event Models for Naive Bayes
Text Classification.. In AAAI/ICML-98
Workshop on Learning for Text
Categorization, pp. 41-48. Technical Report
WS-98-05. AAAI Press. 1998.
- Jason D. M. Rennie, Lawrence Shih, Jaime
Teevan and David R. Karger
Tackling the Poor Assumptions of Naive Bayes
Text Classifiers Proceedings of the
Twentieth International Conference on
Machine Learning. 2003.
- Kang, D-K., Silvescu, A. and Honavar, V.
(2006).
RNBL-MN: A Recursive Naive Bayes Learner for
Sequence Classification. In:
Proceedings of the Tenth Pacific-Asia
Conference on Knowledge Discovery and Data
Mining (PAKDD 2006). Lecture Notes in
Computer Science.. Berlin: Springer-Verlag.
- Yang, Y. and
G. I. Webb
(2003).
On Why
Discretization
Works for Naive-Bayes
Classifiers.
In Proceedings
of the 16th
Australian
Conference on AI
(AI 03)Lecture
Notes AI 2903,
pages 440-452.
Berlin:
Springer-Verlag.
- George H.
John, Pat
Langley,
Estimating
Continuous
Distributions in
Bayesian
Classifiers
Proceedings of
the 1995
Conference on
Machine
Learning.
- Susana
Eyheramendy,
David Lewis,
David Madigan
On the Naive
Bayes Model for
Text
Categorization.
In:
Proceedings of
the Ninth
International
Workshop on
Artificial
Intelligence and
Statistics.,
Bishop, C.M. and
Frey, B. (Ed).
2003.
Week 4 (starting February 4, 2008)
Evaluation of classifiers. Performance measures: Accuracy, Precision,
Recall, ROC curves.
Evaluation of classifiers -- estimation of performance measures;
confidence interval calculation for estimates; cross-validation;
comparing two hypotheses; hypothesis testing; comparing two learning
algorithms.
Required Readings
Assignments
Recommended Readings
Week 5 (starting February 11, 2008)
The decision tree classifier: decision tree learning algorithm
(Quinlan's ID3 algorithm and extensions C4.5); The problem of
overfitting, missing data.
Linear models for classification: Linear Discriminant Functions, Perceptrons
Required Readings
Recommended Readings
- Fayyad, U.
and Irani, K.B.
(1992).
On the handling
of continuous
valued
attributes in
decision tree
generation.
Machine Learning
vol. 8. pp.
87-102.
- Domingos, P.
(1999).
The Role of
Occam's Razor in
Knowledge
Discovery.
Vol. 3, no. 4.,
pp. 409-425.
- Brodley, C.
and Utgoff, P.
(1995).
Multi-variate
Decision Trees.
Machine Learning 19: 45-77.
- Caragea, D., Silvescu, A.,
and Honavar, V. (2004).
A Framework for Learning from
Distributed Data Using
Sufficient Statistics and its
Application to Learning Decision
Trees. International Journal
of Hybrid Intelligent Systems.
Invited Paper. Vol. 1.
pp. 80-89.
- Johannes E. Gehrke, Raghu Ramakrishnan, and Venkatesh Ganti.
RAINFOREST - A Framework for Fast Decision Tree Construction of Large Datasets.
Data Mining and Knowledge Discovery, Volume 4, Issue 2/3,
July 2000, pp 127-162.
- Quinlan, R..
Induction of Decision Trees, Machine Learning, Vol. 1, pp. 81-106, 1986.
- Quinlan, R.. C4.5: Programs for
Machine Learning. Morgan Kaufmann
Publishers, Inc., 1993
Week 6 (starting February 18, 2008)
Linear models for classification: Linear Discriminant Functions, Perceptrons,
Perceptron Learning algorithm, Multi-category classification, Winner-Take-All (WTA)
networks (linear machines).
Regression: Linear regression, Least Mean Squared (LMS) Error Criterion, Gradient
descent algorithm, LMS solution for classification.
Fisher Linear discriminant
Required Readings
Assignments
Recommended Readings
Week 7 (starting February 25, 2008)
Probabilistic generative models and linear classifiers
Discriminative models for classification, logistic regression models,
training logistic regression, regularized logistic regression
Instance-Based Learning, k-Nearest neighbor learning
Required Readings
- Lecture slides
- Chapter 2.5 from C. Bishop (2006), Pattern
Recognition and Machine Learning.
- Chapters 8 from: Mitchell, T. 1997. Machine Learning.
New York: McGraw Hill.
Recommended Readings
- Minka, T.P. (2004)
A
comparison of Numerical Optimizers for Logistic Regreassion.
- Raina, R., Shen, Y., Ng, A., and McCallum, A. (2003).
Classification with Hybrid Generative/Discriminative Models.
In Proceedings of the IEEE Conference on Neural Information
Systems (NIPS 2003).
- Lasserre, J., C. M. Bishop, and
T. Minka (2006).
Principled hybrids of generative and
discriminative models. In:
Proceedings 2006 IEEE Conference on
Computer Vision and Pattern
Recognition, New York.
- Ng, A. and Jordan, M. (2002)
On Discriminative vs. Generative
Classifiers: A comparison of
logistic regression and Naive Bayes,
Proceedings of the IEEE Conference
on Neural Information Systems (NIPS
2002).
Week 8 (starting March 3, 2008)
Multilayer neural networks: feedforward operations, expressive
power of multilayer networks, Backpropagation algorithm,
Regularization, stopping
criterion, Practical techniques for improving backpropagation,
, error functions for classification. Function optimization
algorithms---conjugate gradient and quasi-Newton algorithms
Required Readings
Assignments
Recommended Readings
Week 9 (starting March 10, 2008)
Support Vector Machines: Dual Representation of the Perceptron
Algorithm, Learning in feature spaces, Kernel Functions,
Kernel-Induced Feature Spaces, Maximum Margin Classifiers,
constrained optimization problem, Learning as optimization, Support
Vectors, Non-Separable Case, The Soft-Margin Classifier
Required Readings
-
Lecture slides
- Chapter 6.1, 6.2, Appendix E, 7.1 from C. Bishop (2006), Pattern
Recognition and Machine Learning.
-
A
Tutorial on Support Vector Machines. Nello Christianini,
International Conference on Machine Learning (ICML 2001).
- M. A. Hearst, B. Schvlkopf, S.
Dumais, E. Osuna, and J. Platt.
Trends and controversies - support
vector machines. IEEE Intelligent
Systems, 13(4):18-28, 1998.
- Brown, M. P., Grundy, W. N.,
Lin, D., Cristianini, N., Sugnet, C.
W., Furey, T. S., Ares, M. Jr, and
Haussler, D.
Knowledge-based analysis of
microarray gene expression data by
using support vector machines.
Proc. Natl. Acad. Sci. USA 97:
262-267: 2000.
- T. Joachims.
Text categorization with support
vector machines: Learning with many
relevant features. In European
Conference on Machine Learning
(ECML-98), 1998.
Assignments
Recommended Readings
- J.C. Burges.
A tutorial on support vector
machines for pattern recognition.
Data Mining and Knowledge Discovery,
2(2):121--167, 1998.
- Chapters 1-7
from
Mathematical methods for economic
theory:
a tutorial by Martin J. Osborne
- Platt, J.
Fast training of support vector
machines using sequential minimal
optimization. In B. Scholkopf,
C. J. C. Burges, and A. J. Smola,
editors, Advances in Kernel Methods
--- Support Vector Learning, pages
185-208, Cambridge, MA, 1999. MIT
Press.
- Laskov, P., Gehl, C., Kruger,
S., Muller, K-R. (2006)
Incremental Support Vector Learning:
Analysis, Implementation and
Applications , Journal of
machine learning research, Vol. 7.
pp. 1909-1936.
- Hsu,
C-W.,
and Lin,
C-J.
(2002).
A
Comparison
of
methods
for
multi-class
Support
Vector
Machines,
IEEE
Transactions
on
Neural
Networks,
Vol. 13,
pp.
415-425.
- Duan,
K-B.,
and
Keerthi,
S.
(2005).
Which is
the best
multiclass
SVM
Method?
- An
Empirical
Study,
Springer-Verlag
Lecture
Notes in
Computer
Science
Vol.
3541,
pp.
278-285.
Additional Information
- Cristianini, N. and Shawe-Taylor, J.
(2000). Support Vector Machines. London:
Cambridge University Press.
- Shawe-Taylor, J. and Cristianini, N.
(2004). Kernel Methods for Pattern
Classification. London: Cambridge University
Press.
Spring Break
Week 10 (starting March 24, 2008)
Bayesian Networks: Syntax and Semantics, D-separation.
Required Readings
Additional Information
- Judea Pearl, Probabilistic Reasoning in Intelligent
Systems: Networks of Plausible Inference. (1988).
- Cowell, R. G. Lauritzen, S. L., and Spiegelhalter,
D. J. Probabilistic Networks and Expert Systems Berlin:
Springer (1999).
- Korb, K.B., and Nicholson, A.E., Bayesian Artificial
Intelligence, Chapman and Hall (2004).
- Richard E. Neapolitan,
Learning Bayesian Networks, Prentice Hall, 2004.
Week 11 (starting March 31, 2008)
Bayesian Networks: modeling
Required Readings
Assignments
Additional Information
- Judea Pearl, Probabilistic Reasoning in Intelligent
Systems: Networks of Plausible Inference. (1988).
- Cowell, R. G. Lauritzen, S. L., and Spiegelhalter,
D. J. Probabilistic Networks and Expert Systems Berlin:
Springer (1999).
- Korb, K.B., and Nicholson, A.E., Bayesian Artificial
Intelligence, Chapman and Hall (2004).
- Richard E. Neapolitan,
Learning Bayesian Networks, Prentice Hall, 2004.
Week 12 (starting April 7, 2008)
Bayesian Networks: inference, learning
Required Readings
Recommended Readings
- R. Dechter, "Bucket
Elimination: A Unifying Framework
for Probabilistic Inference" .
In Uncertainty in Artificial
Intelligence (UAI) 1996.
- Cecil Huang and Adnan Darwiche.
Inference in belief networks: A
procedural guide. In
International Journal of Approximate
Reasoning, 15(3):225-263, October,
1996
- Heckerman, D.,
Geiger, D., and Chickering, D.
(1995).
Learning Bayesian networks: The
combination of knowledge and
statistical data. Machine
Learning, 20(3):197--243.
- Friedman and Koller,
Being Bayesian about Network
Structure: A Bayesian Approach to
Structure Discovery in Bayesian
Networks, Machine Learning,
50:95-126, 2003
- D. Chickering 2003,
Optimal structure identification
with greedy search, the Journal
of Machine Learning Research
-
Tractable
Learning
of Large
Bayes
Net
Structures
from
Sparse
Data,
Goldernberg,
A. and
Moore,
A.
(2004).
In
Proceedings
of the
International
Conference
on
Machine
Learning,
2004.
-
Exact
Bayesian
Structure
Discovery
in
Bayesian
Networks,
Mikko
Koivisto,
Kismat
Sood,
the
Journal
of
Machine
Learning
Research,
2004.
- Nir Friedman, D. Geiger, and M. Goldszmidt,
Bayesian network classifiers. In
Machine Learning 29:131--163, 1997.
-
Using Bayesian Networks to Analyze
Expression Data N.
Friedman, M. Linial, I. Nachman,
and D. Pe'er. Journal of
Computational Biology, 7:601--620,
2000.
-
Inferring cellular networks using
probabilistic graphical models.
N. Friedman, Science.
303:799-805, 2004.
- S. L. Lauritzen and N. A.
Sheehan.
Graphical models for genetic
analyses. Statistical Science,
18, 489-514, 2003
Additional Information
- Judea Pearl, Probabilistic Reasoning in Intelligent
Systems: Networks of Plausible Inference. (1988).
- Cowell, R. G. Lauritzen, S. L., and Spiegelhalter,
D. J. Probabilistic Networks and Expert Systems Berlin:
Springer (1999).
- Korb, K.B., and Nicholson, A.E., Bayesian Artificial
Intelligence, Chapman and Hall (2004).
- Richard E. Neapolitan,
Learning Bayesian Networks, Prentice Hall, 2004.
Week 13 (starting April 14, 2008)
Bayesian Networks: learning
Ensemble Classifiers: Bagging, The Adaboost Algorithm
Required Readings
-
Lecture slides
- Chapter 14.1-3, from C. Bishop (2006), Pattern
Recognition and Machine Learning.
-
Thomas G. Dietterich,
Ensemble methods
in machine learning, 2000
- Breiman, L. (1994).
Bagging Predictors. Tech. Rep.
421, Department of Statistics,
University of California, Berkeley,
CA.
- Freund, R. and Schapire, R. (1999)
A
Short Introduction to Boosting Journal of the Japanese
Society for Artificial Intelligence, Vol 14, pp. 771-780.
- Baur, E., and Kohavi, R. (1999)
An Empirical Comparison of Voting
Classification Algorithms: Bagging,
Boosting, and Variants Machine
Learning. Vol. 36. pp. 105-142.
- Dietterich, T. G., (2000).
An experimental
comparison of three methods for constructing ensembles of
decision trees: Bagging, boosting, and randomization.
Machine Learning, 40 (2) 139-158
Recommended Readings
-
Robert
E.
Schapire.
The
boosting
approach
to
machine
learning:
An
overview.
In D. D.
Denison,
M. H.
Hansen,
C.
Holmes,
B.
Mallick,
B. Yu,
editors,
Nonlinear
Estimation
and
Classification.
Springer,
2003.
-
Friedman,
J.,
Hastie,
T., and
Tibshirani,
R.
(2000).
A
Statistical
View of
Boosting,
Annals
of
Statistics,
Vol. 35,
pp.
337-407.
-
Meir, R.
and
Ratsch,
G.
(2002).
An
Introduction
to
Boosting
and
Leveraging.
Advanced
Lectures
on
Machine
Learning.
Lecture
Notes in
Computer
Science,
pp.
118-183,
Berlin:
Springer-Verlag.
Week 14 (starting April 21, 2008)
Ensemble Classifiers: Error correcting output coding
Mixture Models: Clustering, the EM algorithm, the K-means clustering
algorithm
Required Readings
-
Lecture slides
- Chapter 9, from C. Bishop (2006), Pattern
Recognition and Machine Learning.
Recommended Readings