ComS 573: Machine Learning
Department of Computer Science
Iowa State
University
Spring 2009
STUDY GUIDE
The material to be covered each
week and the assigned readings (along with online lecture notes, if available)
are included on this page. The assigned readings are divided into
required and recommended readings. You
will be responsible for the material covered in the lectures and the assigned
required readings. You are strongly encouraged to explore the recommended
readings.
Survival Tips
- Keep up with the assigned readings.
- If you don't understand something in class, or the assigned
problems/labs/readings, ask questions!
- Do not postpone working on assignments.
- Before you come to class, review the materials from the previous lecture
and come prepared to ask any questions that you might have.
Week 1 (starting January 12, 2009)
Overview of the course
Review of probability theory.
Required Readings
Assignments
Recommended Readings for those unfamiliar with
probability theory
| Recommended Java Readings for those
unfamiliar with Java.
|
|
Week 2 (starting January 19, 2009)
Information theory.
Bayesian Decision Theory.
Required Readings
Recommended Readings
Week 3 (starting January 26, 2009)
Probability density estimation: parameter estimation problem.
Maximum-likelihood parameter estimation, Bayesian parameter estimation,
parameter estimation for normal density, parameter estimation for
discrete variables.
Naive Bayes Classifier
Required Readings
Assignments
Recommended Readings
- Chapter 3 from: Pattern Classification (2nd ed., 2001) by
Richard O. Duda, Peter E. Hart and David G. Stork.
- Chapters 6 from: Mitchell, T. 1997. Machine Learning. New
York: McGraw Hill.
- P. Domingos and M. Pazzani.
On the optimality of the simple Bayesian
classifier under zero-one loss. Machine
Learning, 29:103--130, 1997.
- Rish, I.
An Empirical Study of Naive Bayes Classifier,
In: Proc. ICML 2001.
- D. D. Lewis.
Naive Bayes at forty: The independence
assumption in information retrieval. In
ECML-98: Proceedings of the Tenth European
Conference on Machine Learning, pages 4--15,
Chemnitz, Germany, April 1998. Springer.
- Zhang, J., Kang, D-K., Silvesu, A. and
Honavar, V. (2006). Extended version of
Learning Compact and Accurate Naive Bayes
Classifiers from Attribute Value Taxonomies
and Data Journal of Knowledge and
Information Systems.
- McCallum, A. and Nigam, K.
A Comparison of Event Models for Naive Bayes
Text Classification.. In AAAI/ICML-98
Workshop on Learning for Text
Categorization, pp. 41-48. Technical Report
WS-98-05. AAAI Press. 1998.
- Jason D. M. Rennie, Lawrence Shih, Jaime
Teevan and David R. Karger
Tackling the Poor Assumptions of Naive Bayes
Text Classifiers Proceedings of the
Twentieth International Conference on
Machine Learning. 2003.
- Kang, D-K., Silvescu, A. and Honavar, V.
(2006).
RNBL-MN: A Recursive Naive Bayes Learner for
Sequence Classification. In:
Proceedings of the Tenth Pacific-Asia
Conference on Knowledge Discovery and Data
Mining (PAKDD 2006). Lecture Notes in
Computer Science.. Berlin: Springer-Verlag.
- Yang, Y. and
G. I. Webb
(2003).
On Why
Discretization
Works for Naive-Bayes
Classifiers.
In Proceedings
of the 16th
Australian
Conference on AI
(AI 03)Lecture
Notes AI 2903,
pages 440-452.
Berlin:
Springer-Verlag.
- George H.
John, Pat
Langley,
Estimating
Continuous
Distributions in
Bayesian
Classifiers
Proceedings of
the 1995
Conference on
Machine
Learning.
- Susana
Eyheramendy,
David Lewis,
David Madigan
On the Naive
Bayes Model for
Text
Categorization.
In:
Proceedings of
the Ninth
International
Workshop on
Artificial
Intelligence and
Statistics.,
Bishop, C.M. and
Frey, B. (Ed).
2003.
Week 4 (starting February 2, 2009)
Naive Bayes Classifier: classifying text documents.
Evaluation of classifiers. Performance measures: Accuracy, Precision,
Recall, ROC curves.
Evaluation of classifiers -- estimation of performance measures;
confidence interval calculation for estimates.
Required Readings
Assignments
Recommended Readings
Week 5 (starting February 9, 2009)
Evaluation of classifiers -- cross-validation;
comparing two hypotheses; hypothesis testing; comparing two learning
algorithms.
The decision tree classifier: decision tree learning algorithm (Quinlan's ID3
algorithm).
Required Readings
Recommended Readings
- Fayyad, U.
and Irani, K.B.
(1992).
On the handling
of continuous
valued
attributes in
decision tree
generation.
Machine Learning
vol. 8. pp.
87-102.
- Domingos, P.
(1999).
The Role of
Occam's Razor in
Knowledge
Discovery.
Vol. 3, no. 4.,
pp. 409-425.
- Brodley, C.
and Utgoff, P.
(1995).
Multi-variate
Decision Trees.
Machine Learning 19: 45-77.
- Caragea, D., Silvescu, A.,
and Honavar, V. (2004).
A Framework for Learning from
Distributed Data Using
Sufficient Statistics and its
Application to Learning Decision
Trees. International Journal
of Hybrid Intelligent Systems.
Invited Paper. Vol. 1.
pp. 80-89.
- Johannes E. Gehrke, Raghu Ramakrishnan, and Venkatesh Ganti.
RAINFOREST - A Framework for Fast Decision Tree Construction of Large Datasets.
Data Mining and Knowledge Discovery, Volume 4, Issue 2/3,
July 2000, pp 127-162.
- Quinlan, R..
Induction of Decision Trees, Machine Learning, Vol. 1, pp. 81-106, 1986.
- Quinlan, R.. C4.5: Programs for
Machine Learning. Morgan Kaufmann
Publishers, Inc., 1993
Week 6 (starting February 16, 2009)
Decision tree learning algorithm: the problem of overfitting, missing data.
Linear models for classification: Linear Discriminant Functions, Perceptrons,
Perceptron Learning algorithm, Multi-category classification, Winner-Take-All (WTA)
networks (linear machines).
Regression: Linear regression, Least Mean Squared (LMS) Error Criterion, Gradient
descent algorithm, LMS solution for classification.
Required Readings
- Lecture slides
- Chapters 3 from: Mitchell, T. 1997. Machine Learning.
New York: McGraw Hill.
-
Lecture slides
-
Lecture slides
- Chapter 4.1, 1.1, 1.2.5, 1.5.5, 3.1 from C. Bishop (2006), Pattern
Recognition and Machine Learning.
Assignments
Recommended Readings
Week 7 (starting February 23, 2009)
Probabilistic generative models and linear classifiers
Discriminative models for classification, logistic regression models,
training logistic regression, regularized logistic regression
Required Readings
Assignments
Recommended Readings
- Minka, T.P. (2004)
A
comparison of Numerical Optimizers for Logistic Regreassion.
- Raina, R., Shen, Y., Ng, A., and McCallum, A. (2003).
Classification with Hybrid Generative/Discriminative Models.
In Proceedings of the IEEE Conference on Neural Information
Systems (NIPS 2003).
- Lasserre, J., C. M. Bishop, and
T. Minka (2006).
Principled hybrids of generative and
discriminative models. In:
Proceedings 2006 IEEE Conference on
Computer Vision and Pattern
Recognition, New York.
- Ng, A. and Jordan, M. (2002)
On Discriminative vs. Generative
Classifiers: A comparison of
logistic regression and Naive Bayes,
Proceedings of the IEEE Conference
on Neural Information Systems (NIPS
2002).
Week 8 (starting March 2, 2009)
Logistic regression models, training logistic regression
Multilayer neural networks: feedforward operations, expressive
power of multilayer networks, Backpropagation algorithm,
Regularization, stopping
criterion, Practical techniques for improving backpropagation
Required Readings
-
Lecture slides
- Chapter 5.1-5.3, 5.5 from C. Bishop (2006), Pattern
Recognition and Machine Learning.
Assignments
Recommended Readings
Week 9 (starting March 9, 2009)
Neural networks: error functions for classification. Function optimization
algorithms---conjugate gradient and quasi-Newton algorithms
Instance-Based Learning, k-Nearest neighbor learning
Required Readings
- Lecture slides
- Lecture slides
- Chapter 2.5 from C. Bishop (2006), Pattern
Recognition and Machine Learning.
- Chapters 8 from: Mitchell, T. 1997. Machine Learning.
New York: McGraw Hill.
Recommended Readings
- Chapter 7 from C. Bishop (1995), Neural Networks for
Pattern Recognition.
- Chapter 10 from ``Numerical
Recipes in C'', William H. Press et al.
- Chapter
6 from:
Pattern Classification (2nd ed., 2001) by
Richard O. Duda, Peter E. Hart and David G. Stork
Spring Break
Week 10 (starting March 23, 2009)
Support Vector Machines: Dual Representation of the Perceptron
Algorithm, Learning in feature spaces, Kernel Functions,
Kernel-Induced Feature Spaces, Maximum Margin Classifiers,
constrained optimization problem, Learning as optimization, Support
Vectors, Non-Separable Case, The Soft-Margin Classifier
Required Readings
-
Lecture slides
- Chapter 6.1, 6.2, Appendix E, 7.1 from C. Bishop (2006), Pattern
Recognition and Machine Learning.
-
A
Tutorial on Support Vector Machines. Nello Christianini,
International Conference on Machine Learning (ICML 2001).
- M. A. Hearst, B. Schvlkopf, S.
Dumais, E. Osuna, and J. Platt.
Trends and controversies - support
vector machines. IEEE Intelligent
Systems, 13(4):18-28, 1998.
- Brown, M. P., Grundy, W. N.,
Lin, D., Cristianini, N., Sugnet, C.
W., Furey, T. S., Ares, M. Jr, and
Haussler, D.
Knowledge-based analysis of
microarray gene expression data by
using support vector machines.
Proc. Natl. Acad. Sci. USA 97:
262-267: 2000.
- T. Joachims.
Text categorization with support
vector machines: Learning with many
relevant features. In European
Conference on Machine Learning
(ECML-98), 1998.
Assignments
Recommended Readings
- J.C. Burges.
A tutorial on support vector
machines for pattern recognition.
Data Mining and Knowledge Discovery,
2(2):121--167, 1998.
- Chapters 1-7
from
Mathematical methods for economic
theory:
a tutorial by Martin J. Osborne
- Platt, J.
Fast training of support vector
machines using sequential minimal
optimization. In B. Scholkopf,
C. J. C. Burges, and A. J. Smola,
editors, Advances in Kernel Methods
--- Support Vector Learning, pages
185-208, Cambridge, MA, 1999. MIT
Press.
- Laskov, P., Gehl, C., Kruger,
S., Muller, K-R. (2006)
Incremental Support Vector Learning:
Analysis, Implementation and
Applications , Journal of
machine learning research, Vol. 7.
pp. 1909-1936.
- Hsu,
C-W.,
and Lin,
C-J.
(2002).
A
Comparison
of
methods
for
multi-class
Support
Vector
Machines,
IEEE
Transactions
on
Neural
Networks,
Vol. 13,
pp.
415-425.
- Duan,
K-B.,
and
Keerthi,
S.
(2005).
Which is
the best
multiclass
SVM
Method?
- An
Empirical
Study,
Springer-Verlag
Lecture
Notes in
Computer
Science
Vol.
3541,
pp.
278-285.
Additional Information
- Cristianini, N. and Shawe-Taylor, J.
(2000). Support Vector Machines. London:
Cambridge University Press.
- Shawe-Taylor, J. and Cristianini, N.
(2004). Kernel Methods for Pattern
Classification. London: Cambridge University
Press.
Week 11 (starting March 30, 2009)
Bayesian Networks: Syntax and Semantics, D-separation. Markov
Random Fields.
Required Readings
Additional Information
- Judea Pearl, Probabilistic Reasoning in Intelligent
Systems: Networks of Plausible Inference. (1988).
- Cowell, R. G. Lauritzen, S. L., and Spiegelhalter,
D. J. Probabilistic Networks and Expert Systems Berlin:
Springer (1999).
- Korb, K.B., and Nicholson, A.E., Bayesian Artificial
Intelligence, Chapman and Hall (2004).
- Richard E. Neapolitan,
Learning Bayesian Networks, Prentice Hall, 2004.
Week 12 (starting April 6, 2009)
Bayesian Networks: modeling, inference, learning
Required Readings
Assignments
Recommended Readings
- R. Dechter, "Bucket
Elimination: A Unifying Framework
for Probabilistic Inference" .
In Uncertainty in Artificial
Intelligence (UAI) 1996.
- Cecil Huang and Adnan Darwiche.
Inference in belief networks: A
procedural guide. In
International Journal of Approximate
Reasoning, 15(3):225-263, October,
1996
- Heckerman, D.,
Geiger, D., and Chickering, D.
(1995).
Learning Bayesian networks: The
combination of knowledge and
statistical data. Machine
Learning, 20(3):197--243.
- Friedman and Koller,
Being Bayesian about Network
Structure: A Bayesian Approach to
Structure Discovery in Bayesian
Networks, Machine Learning,
50:95-126, 2003
- D. Chickering 2003,
Optimal structure identification
with greedy search, the Journal
of Machine Learning Research
-
Tractable
Learning
of Large
Bayes
Net
Structures
from
Sparse
Data,
Goldernberg,
A. and
Moore,
A.
(2004).
In
Proceedings
of the
International
Conference
on
Machine
Learning,
2004.
-
Exact
Bayesian
Structure
Discovery
in
Bayesian
Networks,
Mikko
Koivisto,
Kismat
Sood,
the
Journal
of
Machine
Learning
Research,
2004.
- Nir Friedman, D. Geiger, and M. Goldszmidt,
Bayesian network classifiers. In
Machine Learning 29:131--163, 1997.
-
Using Bayesian Networks to Analyze
Expression Data N.
Friedman, M. Linial, I. Nachman,
and D. Pe'er. Journal of
Computational Biology, 7:601--620,
2000.
-
Inferring cellular networks using
probabilistic graphical models.
N. Friedman, Science.
303:799-805, 2004.
- S. L. Lauritzen and N. A.
Sheehan.
Graphical models for genetic
analyses. Statistical Science,
18, 489-514, 2003
Additional Information
-
Software Packages for Graphical Models / Bayesian
Networks
- Judea Pearl, Probabilistic Reasoning in Intelligent
Systems: Networks of Plausible Inference. (1988).
- Cowell, R. G. Lauritzen, S. L., and Spiegelhalter,
D. J. Probabilistic Networks and Expert Systems Berlin:
Springer (1999).
- Korb, K.B., and Nicholson, A.E., Bayesian Artificial
Intelligence, Chapman and Hall (2004).
- Richard E. Neapolitan,
Learning Bayesian Networks, Prentice Hall, 2004.
Week 13 (starting April 13, 2009)
Bayesian Networks: learning
Required Readings
Recommended Readings
- Heckerman, D.,
Geiger, D., and Chickering, D.
(1995).
Learning Bayesian networks: The
combination of knowledge and
statistical data. Machine
Learning, 20(3):197--243.
- Friedman and Koller,
Being Bayesian about Network
Structure: A Bayesian Approach to
Structure Discovery in Bayesian
Networks, Machine Learning,
50:95-126, 2003
- D. Chickering 2003,
Optimal structure identification
with greedy search, the Journal
of Machine Learning Research
-
Tractable
Learning
of Large
Bayes
Net
Structures
from
Sparse
Data,
Goldernberg,
A. and
Moore,
A.
(2004).
In
Proceedings
of the
International
Conference
on
Machine
Learning,
2004.
-
Silander
and
Myllymaki,
A simple
approach
for
finding
the
globally
optimal
Bayesian
network
structure,
Proceedings
of the
Conference
on
Uncertainty
in
Artificial
Intelligence
(UAI),
2006.
-
Exact
Bayesian
Structure
Discovery
in
Bayesian
Networks,
Mikko
Koivisto,
Kismat
Sood,
the
Journal
of
Machine
Learning
Research,
2004.
-
Mikko
Koivisto,
Advances
in exact
Bayesian
structure
discovery
in
Bayesian
networks,
Proceedings
of the
Conference
on
Uncertainty
in
Artificial
Intelligence
(UAI),
2006.
- Nir Friedman, D. Geiger, and M. Goldszmidt,
Bayesian network classifiers. In
Machine Learning 29:131--163, 1997.
-
Using Bayesian Networks to Analyze
Expression Data N.
Friedman, M. Linial, I. Nachman,
and D. Pe'er. Journal of
Computational Biology, 7:601--620,
2000.
-
Inferring cellular networks using
probabilistic graphical models.
N. Friedman, Science.
303:799-805, 2004.
- S. L. Lauritzen and N. A.
Sheehan.
Graphical models for genetic
analyses. Statistical Science,
18, 489-514, 2003
Additional Information
-
Software Packages for Graphical Models / Bayesian
Networks
- Judea Pearl, Probabilistic Reasoning in Intelligent
Systems: Networks of Plausible Inference. (1988).
- Cowell, R. G. Lauritzen, S. L., and Spiegelhalter,
D. J. Probabilistic Networks and Expert Systems Berlin:
Springer (1999).
- Korb, K.B., and Nicholson, A.E., Bayesian Artificial
Intelligence, Chapman and Hall (2004).
- Richard E. Neapolitan,
Learning Bayesian Networks, Prentice Hall, 2004.
Week 14 (starting April 20, 2009)
Ensemble Classifiers: Bagging, The Adaboost Algorithm, Error correcting
output coding
Required Readings
-
Lecture slides
- Chapter 14.1-3, from C. Bishop (2006), Pattern
Recognition and Machine Learning.
-
Thomas G. Dietterich,
Ensemble methods
in machine learning, 2000
- Breiman, L. (1994).
Bagging Predictors. Tech. Rep.
421, Department of Statistics,
University of California, Berkeley,
CA.
- Freund, R. and Schapire, R. (1999)
A
Short Introduction to Boosting Journal of the Japanese
Society for Artificial Intelligence, Vol 14, pp. 771-780.
- Baur, E., and Kohavi, R. (1999)
An Empirical Comparison of Voting
Classification Algorithms: Bagging,
Boosting, and Variants Machine
Learning. Vol. 36. pp. 105-142.
- Dietterich, T. G., (2000).
An experimental
comparison of three methods for constructing ensembles of
decision trees: Bagging, boosting, and randomization.
Machine Learning, 40 (2) 139-158
-
Thomas G. Dietterich and G. Bakiri (1995),
Solving Multiclass
Learning Problems via Error-Correcting Output Codes,
Journal of Artificial Intelligence Research.
Recommended Readings
-
Robert
E.
Schapire.
The
boosting
approach
to
machine
learning:
An
overview.
In D. D.
Denison,
M. H.
Hansen,
C.
Holmes,
B.
Mallick,
B. Yu,
editors,
Nonlinear
Estimation
and
Classification.
Springer,
2003.
-
Friedman,
J.,
Hastie,
T., and
Tibshirani,
R.
(2000).
A
Statistical
View of
Boosting,
Annals
of
Statistics,
Vol. 35,
pp.
337-407.
-
Meir, R.
and
Ratsch,
G.
(2002).
An
Introduction
to
Boosting
and
Leveraging.
Advanced
Lectures
on
Machine
Learning.
Lecture
Notes in
Computer
Science,
pp.
118-183,
Berlin:
Springer-Verlag.
-
Robert
E.
Schapire
(1997)
Using
output
codes to
boost
multiclass
learning
problems.
Proc. of
14th
International
Conference
on
Machine
Learning.
-
Erin L.
Allwein,
Robert
E.
Schapire,
and
Yoram
Singer
(2000),
Reducing
Multiclass
to
Binary:
A
Unifying
Approach
for
Margin
Classifiers,
Journal
of
Artificial
Intelligence
Research
(JMLR)
Week 15 (starting April 27, 2009)
Student oral presentation of term projects.
-------------------------------------------
Monday April 27, 10-10:50am
NAM H. PHAM and Matt Herbst
Automatic Bug Identification in Code Repositories
Chao-Chun Chang
Design an efficient orthogonal momentum-type PSO algorithm to solve a
Large Parameter Optimization Problem
Shane Griffith
Investigating a Robot's Capability to Learn about Containers: can a
robot form a container object category by probing objects?
JungGyu Yang
Building a predictive model for Customer Relationships Management with
Machine Learning
-------------------------------------------
Wednesday April 29, 10-10:50am
Lavanya Ram
Learning the k-best Bayesian network structures
Harris Lin
Bayesian Classifier
Christopher Bruno
Reinforcement Learning and Blackjack
Yetian Chen
Supervised and Semi-Supervised Learning Applied to Classification
Problems in Physics Experiment
-------------------------------------------
Friday May 1, 10-10:50am
Joshua J Clausman
Time-series turning point prediction with flexible neural trees.
Sushain Pandit
On utilizing dynamically extracted RDF knowledge-bases to enable
context-sensitive classification.
Yang Liu
KDD cup
Hailin Tang
Netflix prize
Hyuntae Na
Netflix prize
KyoungTak Cho
Netflix prize
-------------------------------------------