 |
Machine Learning
Department of Computer Science
Iowa State University
|
Study Guide
Week 1 (January 13, 2003)
Overview of the course. Overview of machine learning. Why should machines learn?
Operational definition of learning. Taxonomy of machine learning.
Specification of a computational model of learning. Example: specification
and analysis of a model for conjunctive concept learning. Role of
representational and inferential bias in learning.
Inductive Learning of Pattern Classifiers. Decision Tree Induction from
Examples Occam's razor. Elements of Information Theory.
Quinlan's ID3 (Iterative Dichotomizer 3) algorithm and its variants.
Required readings
-
Overview of the Course.
-
Overview of Artificial Intelligence (PDF), (Especially if you have not taken ComS 572) Vasant Honavar.
-
Lecture slides
-
Overview of Machine Learning, Nils Nilsson.
- Chapters 1 and 3 from Mitchell's Machine Learning Textbook.
-
Automated Learning and Discovery: State-Of-The-Art and Research Topics in a Rapidly Growing Field
Sebastian Thrun, Christos Faloutsos, Tom Mitchell and Larry Wasserman. Technical Report CMU-CS-CALD-98-100. 1998.
-
Does Machine Learning Really Work?, Tom Mitchell.
-
A. Buja and Y-S. Lee.
Data Mining Criteria for Tree-Based Regression and Classification
-
Caragea, D., Silvescu, T., and Honavar, V. (2001).
Multi-Agent Decision Tree Learning from Distributed Autonomous Data Sources. ISU CS-TR 2001-05. Department of Computer Science, Iowa State University.
-
Graefe, G. Fayyad, U., Chaudhuri, S.
On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL databases., Microsoft Research.
-
Balakrishnan, K. and Honavar, V. (1998).
Intelligent Diagnosis Systems. Journal of Intelligent Systems. Vol. 8. pp. 239-290.
-
Wang, X., Schroeder, D., Dobbs, D., and Honavar, V. (2002).
Automated Data-Driven Discovery of Motif-Based Protein Function Classifiers . Information Sciences. In press. Earlier version appeared in: Proceedings of the Conference on Computational Biology and Genome Informatics (CBGI-02).
-
Silvescu, A., and Honavar, V. (2001). Temporal Boolean Network Models of Genetic Networks and Their Inference from Gene Expression Time Series. Complex Systems. Vol 13. No. 2.
Strongly Recommended Java Readings for those unfamiliar with Java.
Additional Information
- Rulequest.com
-
Honavar, V. (1997).
Computational Learning Theory (tutorial).
-
Honavar, V. (1994). Toward Learning Systems That Use Multiple Strategies and Representations.
In: Artificial Intelligence and Neural Networks: Steps Toward
Principled Integration. pp. 615-644.
-
Duda, R., Hart, R., And Stork, D. (2000). Pattern Recognition. Wiley.
-
Langley, P. (1995). Elements of Machine Learning. Palo Alto, CA: Morgan Kaufmann.
-
Quinlan, R. (1993). C4.5: Programs for Machine Learning. Palo Alto, CA: Morgan Kaufmann.
-
Sestito, S. and Dillon, T. (1994). Automated Knowledge Acquisition.
Sydney, Australia: Prentice-Hall.
-
Machine Learning Resources by David Aha.
-
Computational Models of Learning by V. Honavar
-
WEKA Machine Learning Algorithms in Java
Week 2 (January 20, 2003)
Algorithms for Learning Decision Trees. Evaluation of decision tree classifiers -- estimation of performance measures; confidence interval calculation for estimates; cross-validation based estimates of hypothesis performance; leave-one-out and bootstrap estimates of performance; comparing two hypotheses; hypothesis testing and null hypothesis; comparing two learning algorithms. Advantages and pitfalls of different procedures for estimating hypothesis and learning algorithms.
Dealing with categorical, numeric, and ordinal attributes. Dealing with missing attribute values during tree induction and instance classification; Learning decision trees from attribute-value hierarchies (attribute taxonomies) and data.
Required readings
- Chapter 3 from Mitchell's Machine Learning Textbook.
- Lecture Slides. (Note: Monday, Jan 20 was a holiday)
-
WEKA Machine Learning Algorithms in Java
-
A. Buja and Y-S. Lee.
Data Mining Criteria for Tree-Based Regression and Classification
-
Brodley, C. and Utgoff, P. (1995). Multi-variate Decision Trees. Machine Learning 19: 45-77.
-
Wang, H. and Zaniolo, C. (2000).
CMP: A Fast Decision Tree Classifier Using Multi-variate Predictions In: Proceedings of the International Conference on Data Engineering.
-
Domingos, P. (1999). The Role of Occam's Razor in Knowledge Discovery. Preprint.
-
Dietterich, T., Kong, E. (1995). Machine Learning Bias, Statistical Bias, Statistical Variance of Decision Tree Algorithms. Technical Report, Department of Computer Science, Oregon State University
Recommended Readings
Additional Information
- Cohen, P.R.Empirical Methods in Artificial Intelligence
- Rulequest.com
-
Duda, R., Hart, R., And Stork, D. (2000). Pattern Recognition. Wiley.
-
Langley, P. (1995). Elements of Machine Learning. Palo Alto, CA: Morgan Kaufmann.
-
Quinlan, R. (1993). C4.5: Programs for Machine Learning. Palo Alto, CA: Morgan Kaufmann.
-
Sestito, S. and Dillon, T. (1994). Automated Knowledge Acquisition.
Sydney, Australia: Prentice-Hall.
-
Machine Learning Resources by David Aha.
-
Aha, D. (1995).
Machine Learning (tutorial).
-
Additional AI Links by V. Honavar
- AI on the web by Russell and Norvig
Week 3 (January 27, 2003)
Algorithms for Learning Decision Trees (continued). Dealing with categorical, numeric, and ordinal attributes. Dealing with missing attribute values during tree induction and instance classification;
Overfitting and methods to avoid overfitting -- dealing with small sample sizes; prepruning and post-pruning. Alternative splitting strategies -- two-way versus multi-way splits; Alternative split criteria: Gini impurity, Entropy, etc. Cost-sensitive decision tree induction -- incorporating attribute measurement costs and misclassification costs into decision tree induction.
Introduction to Artificial Neural Networks. Threshold logic unit (perceptron) and the associated hypothesis space. Connection with Logic and Geometry. Weight space and pattern space representations of perceptrons. Linear separability and related concepts. Perceptron Learning algorithm and its variants. Convergence properties of perceptron algorithm. Multi-category perceptron.
Required readings
- Chapter 3 from Mitchell's Machine Learning Textbook.
- Lecture Slides. (decision trees). Vasant Honavar
- Lecture Slides. (perceptrons). Vasant Honavar.
- Chapter 4 from Mitchell's Machine Learning Textbook.
- Nils Nilsson. Neural Networks.
-
Honavar, V.
Threshold Logic Units.
-
Honavar, V.
Perceptron Learning Algorithm.
-
Honavar, V.
Multi-category Generalizations of Perceptron Algorithm.
-
Brodley, C. and Utgoff, P. (1995). Multi-variate Decision Trees. Machine Learning 19: 45-77.
-
Turney, P.D. (2000).
Types of cost in inductive concept learning.
Workshop on Cost-Sensitive Learning at the Seventeenth International
Conference on Machine Learning (WCSL at ICML-2000),
Stanford University, California, pp. 15-21.
-
Zhang, J., Silvescu, A., and Honavar, V. (2002). Ontology-Driven Induction of Decision Trees at Multiple Levels of Abstraction. In: Proceedings of Symposium on Abstraction, Reformulation, and Approximation. Berlin: Springer-Verlag.
-
Zhadrozny, B. and Elkan, C. (2001b). Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 609--616. Morgan Kaufmann Publishers, Inc.
Recommended Readings
-
Manish Mehta Jorma Rissanen Rakesh Agrawal (1995)
MDL-based Decision Tree Pruning.. In: Proceedings of KDD-95.
-
Bradford, C. Kunz, R. Kohavi, C. Brunk, and C.E. Brodley. Pruning decision trees with misclassification costs. In Proceedings of Tenth European Conference on Machine Learning (ECML-98), pages 131-136, Berlin, 1998. 95
-
Elomaa, T., and Rousu, J. (1997). On the well-behavedness of important attribute evaluation functions. In G. Grahne (Ed.), Proceedings of the Sixth Scandinavian Conference on Artificial
Intelligence (pp. 95--106). Frontiers in Artificial Intelligence and Applications (Vol. 40).
-
Elomaa, T. and Rousu, J. (1999). General and Efficient Multisplitting of Numerical Attributes.. Machine Learning. 1999.
-
W.-Y. Loh and Y.-S. Shih, Split selection methods for classification trees, Statistica Sinica, vol. 7, pp. 815--840, 1997.
- J. E. Gehrke, V. Ganti, R. Ramakrishnan, and W-Y Loh.
BOAT -- Optimistic Decision Tree Construction. In Proceedings of the 1999 SIGMOD Conference, Philadelphia, Pennsylvania, 1999.
-
Johannes E. Gehrke, Raghu Ramakrishnan, and Venkatesh Ganti. RAINFOREST - A Framework for Fast Decision Tree Construction of Large Datasets. Data Mining and Knowledge Discovery, Volume 4, Issue 2/3, July 2000, pp 127-162.
-
Leiva, H., Atramentov, A., and Honavar, V. (2002). Experiments with MRDTL -- A Multirelational Decision Tree Learning Algorithm. In: Proceedings of the Workshop on Multi-Relational Decision Tree Learning. Berlin: Springer-Verlag.
-
Codrington, C. W. and Brodley, C. E., On the Qualitative Behavior of Impurity-Based Splitting Rules: The Minima-Free Property. Tech. Rep. 97-05. Dept. of Computer Science. Cornell University.
-
Mansour, Y. and McAllester, D. (2000).
Generalization Bounds for Decision Trees. In: Proceedings of COLT-2000.
-
M. Kearns and Y. Mansour.
A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization. In: Proceedings of the 15th International Conference
on Machine Learning, 1998, Morgan Kaufmann.
-
Elkan, C. (2001). Foundations of Cost-Sensitive Learning. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01). Seattle, Washington, USA.
Additional Information
-
Duda, R., Hart, R., And Stork, D. (2000). Pattern Recognition. Wiley.
-
Langley, P. (1995). Elements of Machine Learning. Palo Alto, CA: Morgan Kaufmann.
-
Quinlan, R. (1993). C4.5: Programs for Machine Learning. Palo Alto, CA: Morgan Kaufmann.
-
Sestito, S. and Dillon, T. (1994). Automated Knowledge Acquisition.
Sydney, Australia: Prentice-Hall.
-
Bishop, C. (1996). Neural Networks and Pattern Recognition. Oxford University Press.
-
Ripley, B. (1995). Pattern Recognition and Neural Networks. Cambridge University Press.
-
Honavar, V. (1998).
Elements of Neural Computation.
- Neural Network Resources
- Pattern Recognition
-
Machine Learning Resources by David Aha.
-
Additional AI Links by V. Honavar
- AI on the web by Russell and Norvig
Week 4 (Beginning February 3, 2003)
Review of elements of probability. Probability spaces. Ontological and epistemological commitments of
probabilistic representations of knowledge. Bayesian
(subjective view of probability) -- Probabilities as meaasures of belief conditioned on the agent's knowledge. Possible world interpretation of probability. Axioms of probability. Conditional
probability. Bayes theorem. Random Variables.
Discrete Random Variables as functions from event spaces to Value sets. Possible world interpretation of random variables. Joint Probability distributions. Conditional Probability Distributions.
Conditional Independence of Random variables. Pair-wise independence and independence. Entropy of random variables, Information, Mutual Information. Measures of distance between probability distributions -- Kullback-Liebler divergence.
Required readings
Recommended Readings
Additional Information
Week 5 (Beginning February 10, 2003)
Bayesian Learning. Learning Maximum a-posteriori (MAP) and Maximum Likelihood (ML) hypothesis from data. The relationship between MAP hypothesis learning, minimum description length principle (Occam's razor) and the role of priors.
Equivalence of ML hypothesis learner and consistent learner for classification tasks. Equivalence of ML hypothesis learning and minimization of mean squared error for function approximation problems.
ML estimation of probabilities. Bayes Optimal Classification (and how it differs from Maximum A posteriori Classifier (hypothesis). Gibbs Classifier. Minimal Error Bayes Classifier, Minimum Risk Bayes Classifier, Naive Bayes Classifier and its relation to Linear Discriminant Functions. Estimation of probabilities in practice. Example -- Naive Bayes text classification. Bayesian Networks. Conditional Independence Revisited, d-separation, and compact representation of joint probability distribution functions in Bayes Networks.
Required readings
Recommended Readings
-
A Brief Introduction to Bayesian Networks , Kevin Murphy.
-
Factor graphs and the sum-product algorithm.
Kschischang, B. Frey, and H. Loeliger. IEEE Transactions on Information Theory, 47(2):498-519, 2001.
-
Generalized Distributive Law. McEliece and Aji.
-
Introduction to Variational Methods for Graphical Models Jordan, M. and Ghahramani, Z. 1999.
-
An Introduction to MCMC for Machine Learning, Andrieu et al., Machine Learning, 2001.
-
Introduction to Monte Carlo Methods. D.J.M. Mackay.
Additional Material
-
Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1990.
-
Bishop, C. (1996). Neural Networks and Pattern Recognition. Oxford University Press.
-
Ripley, B. (1995). Pattern Recognition and Neural Networks. Cambridge University Press.
-
Bayesian Networks and Decision Graphs. Finn Jensen. Berlin: Springer-Verlag. 2001.
-
The Elements of Statistical Learning : Data Mining, Inference, and Prediction.
T. Hastie, R. Tibshirani, J. H. Friedman. (2001). Springer
-
Pattern Recognition. Duda, Hart, and Storck (2001). Wiley.
-
Statistical Inference, Casella and Berger (2001).
Week 6, Beginning Feb 17, 2003.
Bayesian Learning (continued). Reasoning with Bayes Networks, Some algorithms for Exact Inference, and Approximate Inference of relevant probabilities
from a Bayesian network using stochastic simulation (Sampling), Learning Bayesian Networks from Data. Learning of parameters (conditional probability tables)
from fully specified instances (when no attribute values are missing) in a network of known structure -- Maximum Likelihood and Bayesian estimation of
parameters from data, detailed treatment of estimation of parameters for multinomial distributions using conjugate Dirichlet priors; Learning Bayesian networks
with unknown structure -- searching the space of network topologies using suitable scoring functions to guide the search, structure learning in practice
-- learning low order Bayesian networks (where each random variable directly depends on a small number of others), Bayesian approach to structure discovery,
examples. Learning Bayesian network parameters in the presence of missing attribute values (using Expectation Maximization) when the structure is known;
Learning networks of unknown structure in the presence of missing attribute values.
Required readings
-
Chapter 6 from: Mitchell, T. 1997.
Machine Learning. New York: McGraw Hill.
-
Lecture Slides. Vasant Honavar
-
Bayesian Networks and Decision-Theoretic Reasoning for Artificial Intelligence, Daphne Koller and Jack Breese. Tutorial Given at AAAI-97.
-
Inference in Bayesian Networks -- A Procedural Guide Huang, C., A. Darwiche. Journal of Approximate Reasoning. Vol 15. pp. 225-263.
-
An Introduction to MCMC for Machine Learning, Andrieu et al., Machine Learning, 2001.
-
A Tutorial on Learning with Bayesian Networks, David Heckerman. Tech. Rep. MSR-TR-95-06. Microsoft Research.
-
Learning Bayesian belief networks: An approach based on the MDL principle., W. Lam and F. Bacchus, Computational Intelligence, 10(4), 1994.
-
Learning Bayesian Network Classifiers for Credit Scoring Using Markov Chain Monte Carlo Search, B. Baesens et al., 2001.
-
Bayesian Network Classifiers Friedman, N., Geiger, D., and Goldszmidt, M. Machine Learning 29: pp. 131-163. 1997.
-
Operations for Learning with Graphical Models Wray Buntine. Journal of Artificial Intelligence Research. Vol. 2. pp. 159-225. 1994.
-
Friedman, N 1998. The Bayesian Structural EM Algorithm Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence Morgan Kaufmann. 1998.
-
Being Bayesian about Network Structure - A Bayesian Approach To Structure Discovery in Bayesian Networks N. Friedman and D. Koller. Machine Learning. To appear.
Recommended Readings
-
Learning Bayesian Belief Networks Based on the Minimum Description Length Principle: Basic Properties.J. Suzuki, IEICE Transactions on Fundamentals, vol. E82, No. 10., pp. 2237 2245
-
Comparing Model Selection Criteria for Belief Networks Tim Van Allen, Russ Greiner, 2000.
-
Using bayesian networks to analyze expression data.
Friedman, M. Linial, I. Nachman, and D. Per. Proceedings of the 4th Annual International Conference on Computational Molecular Biology (RECOMB-00), pages 127-135, N.Y., April 8-11 2000. ACM Press.
-
George H. John, Pat Langley, Estimating Continuous Distributions in Bayesian Classifiers Proceedings of the 1995 Conference on Machine Learning.
Additional Material
-
Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1990.
-
Bishop, C. (1996). Neural Networks and Pattern Recognition. Oxford University Press.
-
Ripley, B. (1995). Pattern Recognition and Neural Networks. Cambridge University Press.
-
Bayesian Networks and Decision Graphs. Finn Jensen. Berlin: Springer-Verlag. 2001.
-
The Elements of Statistical Learning : Data Mining, Inference, and Prediction.
T. Hastie, R. Tibshirani, J. H. Friedman. (2001). Springer
-
Pattern Recognition. Duda, Hart, and Storck (2001). Wiley.
-
Statistical Inference, Casella and Berger (2001).
Week 7 (beginning February 23, 2003)
Introduction to neural networks as trainable function approximators. Function approximation from examples. Least Mean Squared (LMS) Error Criterion. Minimization of Error Functions. Review of Relevant Mathematics (Limits, Continuity and Differentiablity of Functions, Local Minima and Maxima, Derivatives, Partial Derivatives, Taylor Series Approximation, Multi-Variate Taylor Series Approximation)
Derivation of a Learning Rule for Minimizing Mean Squared Error Function for a Simple Linear Neuron.
Momentum modification for speeding up learning.
Introduction to neural networks for nonlinear function approximation.
Nonlinear function approximation using multi-layer neural networks.
Universal function approximation theorem. Derivation of the generalized delta rule (GDR) (the backpropagation learning algorithm).
Required readings
-
Mitchell, T. Chapter 4, Machine Learning.
-
Lecture Slides. Vasant Honavar
-
Honavar, V. Function Approximation from Examples.
-
Honavar, V. Multi-layer networks.
-
T. G. Dietterich, H. Hild, and G. Bakiri.
A comparative study of ID3 and backpropagation for English text-to-speech mapping. In Proceedings of 7th IMLW, Austin, 1990.
Morgan Kaufmann.
-
Y. Le Cun, B. Boser, J.S. Denker, D. Henderson, R. Howard, W. Hubbard,
and L. Jackel.Handwritten digit recognition with a backpropagation neural network. In D.
Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 396--404. Morgan
Kaufmann, San Mateo, CA, 1990.
-
Chapter 4 from: Mitchell, T. 1997.
Machine Learning. New York: McGraw Hill.
-
Thrun and T.M. Mitchell. Learning one more thing. Technical Report CMU-CS-94-184, Carnegie Mellon University, Pittsburgh, PA 15213, 1994.
-
R. Setiono, W.K. Leow and J.M. Zurada. Extraction of rules from artificial neural networks for nonlinear regression, IEEE Transactions on Neural Networks, 2002.
Recommended Readings
-
Caruana, R.
Learning Many Related Tasks through backpropagation in NIPS, MIT Press, 1995.
-
R. Williams, and D. Zipser. Gradient-Based Learning Algorithms for
Recurrent Networks and
Their Computational Complexity. In Backpropagation: Theory,
Architectures, and Applications, Chauvin and Rumelhart, Eds., LEA, 1995, pp. 433-485.
-
Solomon, R. and J. L. van Hemmen (1996).
Accelerating backpropagation through dynamic self-adaptation.
Neural Networks 9 (4), 589--601.
-
Craven, M. and Shavlik, J.
Using Neural Networks for Data Mining.
Additional Material
-
Gallant, S. (1993). Neural Network Learning and Expert Systems. Cambridge, MA: MIT Press.
-
Haykin, S. (1994). Neural Networks. New York: MacMillan.
-
Mehrotra, K., Mohan, C.K., and Ranka, S. Elements of Artificial Neural Networks. Cambridge, MA: MIT Press.
-
Ripley, B. (1995). Pattern Recognition and Neural Networks. Cambridge University Press.
-
Pattern Recognition. Duda, Hart, and Storck (2001). Wiley.
Week 8 (beginning March 3, 2003)
Derivation of the generalized delta rule (GDR) (the backpropagation learning algorithm) (review). Generalized delta rule (backpropagation algorithm) in practice - avoiding overfitting, choosing neuron activation
functions, choosing learning rate, choosing initial weights, speeding up learning, improving generalization, circumventing local minima, using domain-specific constraints (e.g., translation invariance in visual pattern recognition), exploiting hints, using neural networks for function approximation and pattern classification. Relationship between neural networks and Bayesian pattern classification. Variations -- Radial basis function networks. Learning non linear functions by searching the space of network topologies as well as weights.
Required readings
Recommended Readings
Additional Material
-
Gallant, S. (1993). Neural Network Learning and Expert Systems. Cambridge, MA: MIT Press.
-
Haykin, S. (1994). Neural Networks. New York: MacMillan.
-
Mehrotra, K., Mohan, C.K., and Ranka, S. Elements of Artificial Neural Networks. Cambridge, MA: MIT Press.
-
Ripley, B. (1995). Pattern Recognition and Neural Networks. Cambridge University Press.
-
Pattern Recognition. Duda, Hart, and Storck (2001). Wiley.
Week 9 (beginning March 10, 2003)
Learning Conditional Probability Distributions Associated with Nodes in a Bayesian Network Using Gradient Ascent.
Lazy Learning Algorithms. Instance based Learning, K-nearest neighbor classifiers, distance functions, locally weighted regression,
sample application to document classification using TFIDF representation. Relative advantages and disadvantages of lazy learning and eager
learning.
Midterm
Required readings
Recommended Readings
-
P.N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Fourth ACM-SIAM Symposium on Discrete Algorithms, pages 311-- 321, January 1993.
-
Yang, J. and Honavar, V. (1999) DistAl: An Inter-Pattern Distance Based Constructive Neural Network Learning Algorithm.. Intelligent Data Analysis. Vol. 3. pp. 55-73.
-
Yang, J., Pai, P., Honavar, V., and Miller, L. (1998).
Mobile Intelligent Agents for Document Classification and Retrieval: A Machine Learning Approach.
In: Proceedings of the European Symposium on Cybernetics and Systems Research.
-
Yang, J. and Honavar, V. (1998).
Feature Subset
Selection Using a Genetic Algorithm. In: Feature Extraction, Construction, and Subset Selection: A Data Mining Perspective. Motoda, H. and Liu, H. (Ed.) New York: Kluwer. 1998. In press. A shorter version of this paper appeared
in IEEE Intelligent Systems Vol. 13, No. 2. pp. 44-49.
-
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. JACM: Journal of the ACM, 45, 1998.
-
A. Hinneburg, C. Aggarwal, and D. Keim (2000). What is the nearest neighbor in high dimensional spaces?, International Conference on Very Large Data Bases, Cairo, Egypt, 2000, pages 506-515.
Additional Material
-
Gallant, S. (1993). Neural Network Learning and Expert Systems. Cambridge, MA: MIT Press.
-
Haykin, S. (1994). Neural Networks. New York: MacMillan.
-
Mehrotra, K., Mohan, C.K., and Ranka, S. Elements of Artificial Neural Networks. Cambridge, MA: MIT Press.
-
Ripley, B. (1995). Pattern Recognition and Neural Networks. Cambridge University Press.
-
Pattern Recognition. Duda, Hart, and Storck (2001). Wiley.
Spring Break
Week 10 (Beginning March 23, 2003)
Mistake bound analysis of learning algorithms. Mistake bound analysis of online algorithms for learning Conjunctive Concepts. Optimal Mistake Bounds. Version Space Halving Algorithm. Randomized Halving Algorithm. Learning monotone disjunctions in the presence of irrelevant attributes -- the Winnow and Balanced Winnow Algorithms. Multiplicative Update Algorithms for concept learning and function approximation. Weighted majority algorithm. Applications.
Probably Approximately Correct (PAC) Learning Model. Efficient PAC learnability. Sample Complexity of PAC Learning in terms of cardinality of hypothesis space (for finite hypothesis classes). Some Concept Classes that are easy to learn within the PAC setting.
Required readings
-
Lecture Slides. V. Honavar
-
Chapter 7, Mitchell, T. 1997. Machine Learning. New York: McGraw Hill.
-
The weighted Majority Algorithm, Littlestone, N., and Warmuth, M. Information and Computation Vol. 108: 212-261. 1994.
-
Empirical Support for Winnow and Weighted Majority Algorithms, Blum, A. In: Proceedings of the
Twelfth International Conference on Machine Learning, pages 64--72. Morgan Kaufmann, 1995.
-
Applying Winnow to Context-Sensitive Spelling Correction Golding, A., and Roth, D.
Machine Learning, 34(1-3):107--130, 1999.
-
M.H. Yang, D. Roth, and N. Ahuja. A SNoW-based face detector. NIPS(12), pages 855-861, 2000.
-
-
A Tutorial on Computational Learning Theory. V. Honavar
-
Overview of the Probably Approximately Correct (PAC) Learning Framework. D. Haussler, 1995.
Recommended Readings
-
Computational Learning Theory Sally Goldman.
-
A.J. Grove, N. Littlestone, and D. Schuurmans. General convergence results for linear discriminant updates. In Proc. 10th Annu. Conf. on Comput. Learning Theory, pages 171--183, 1997.
-
Tong Zhang. Regularized winnow methods. In Advances in Neural Information Processing Systems 13, pages 703-709, 2001.
-
Helmbold, D.P., Schapire, R.E., Singer, Y., Warmuth, M.K. On-line portfolio selection using multiplicative updates. Mathematical Finance, vol. 8 (4), pp.325-347, 1998.
Additional Information
-
Computational Learning Theory. Kearns, M. and Vazirani, U. Cambridge, MA: MIT Press.
-
Machine Learning: A Theoretical Approach. B. Natarajan. New York: Kluwer.
Week 11 (Beginning March 31, 2003)
Efficiently PAC learnable concept classes. Sufficient conditions for efficient PAC learnability. Some concept classes that are not efficiently learnable in the PAC setting. Making hard-to-learn concept classes efficiently learnable -- transforming instance representation and hypothesis representation. Occam Learning Algorithms. PAC Learnability of infinite concept classes. Vapnik-Chervonenkis (VC) dimension. Properties of VC dimension, VC dimension and learnability, Learning from Noisy examples, Transforming weak learners into PAC learners through accuracy and confidence boosting, Learning under helpful distributions - Kolmogorov Complexity, Conditional Kolmogorov Complexity, Universal distributions, Learning Simple Concepts, Learning from Simple Examples
Required readings
-
Lecture Slides. V. Honavar
-
Chapter 7, Mitchell, T. 1997. Machine Learning. New York: McGraw Hill.
-
Overview of the Probably Approximately Correct (PAC) Learning Framework. D. Haussler, 1995.
-
Kearns, M. 1998. Efficient Noise Tolerant Learning from Statistical Queries. Journal of the ACM. Vol. 45, pp. 983-1006.
-
Parekh, R. and Honavar, V. (2000). On the Relationships between Models of Learning in Helpful Environments. In: Proceedings of the Fifth International Conference on Grammatical Inference. Lisbon, Portugal.
-
Parekh, R. and Honavar, V. (2001). DFA Learning from Simple Examples. Machine Learning. Vol. 44. pp. 9-35.
-
Polikar, R., Udpa, S., Udpa, L., and Honavar, V. Learn++: An Incremental Learning Algorithm for Multi-Layer Perceptron Networks. IEEE Transactions on Systems, Man, and Cybernetics. Vol. 31, No. 4. pp. 497-508.
Recommended Readings
-
Computational Learning Theory Sally Goldman.
-
Cesa-Bianchi, N., Dichterman, E., Fischer, P., Shamir, E., Simon, H. 1999. Sample-Efficient Strategies for Learning in the Presence of Noise. Journal of the ACM. Vol. 46. pp. 684-719.
-
Goldreich, O. and Goldwasser, S. 1998. Property testing and its connection to Learning and approximation. Journal of the ACM. Vol. 45. pp. 653-750.
-
Khardon, R. and Roth, D. 1997. Learning to Reason. Journal of the ACM. Vol, 44. pp. 697-725.
-
Valiant, L. 2000. A Neuroidal Architecture for Cognitive Computation. Journal of the ACM. Vol. 47. pp. 854-882.
-
Maass, W. 1994. Efficient Agnostic PAC Learning With Simple Hypotheses. . Proceedings of the Seventh Annual Conference on Computational Learning Theory. 1994. pp. 67-75.
-
Benedek. G. and Itai, A. Dominating Distributions and Learnability. In: Annual Workshop on Computational Learning Theory. 1992.
Additional Information
-
Computational Learning Theory. Kearns, M. and Vazirani, U. Cambridge, MA: MIT Press.
-
Machine Learning: A Theoretical Approach. B. Natarajan. New York: Kluwer.
Week 12 (April 6, 2003)
Support Vector Machines. Background: Dual representation of Perceptrons.
A learning algorithm using dual representation of perceptrons.
Margin and geometric margin. Maximal Margin Separating Hyperplanes -- Why?
Maximal Margin separating hyperplanes -- How?
Introduction to Lagrange/Karush-Kuhn-Tucker Optimization Theory. Optimization
problems. Linear, quadratic, and convex optimization problems. Primal and
dual representations of optimization problems.
Convex Quadratic programming formulation of the
maximal margin separating hyperplane finding problem. Characteristics of the
maximal margin separating hyperplane. Kernel functions for classification of
non linearly separable data. Soft margin SVM algorithms.
Properties of Kernel functions. Implementation
of SVM.
Required readings
-
Lecture Slides. V. Honavar
-
B. E. Boser, I. M. Guyon, and V. N. Vapnik.
A training algorithm for optimal margin classifiers. 5th Annual ACM Workshop on COLT, pp. 144-152, Pittsburgh, PA, 1992. ACM Press.
-
A Tutorial on Support Vector Machines. Nello Christianini, International Conference on Machine Learning (ICML 2001).
-
J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121--167, 1998.
-
M. A. Hearst, B. Schvlkopf, S. Dumais, E. Osuna, and J. Platt. Trends and controversies - support vector machines. IEEE Intelligent Systems, 13(4):18-28, 1998.
-
Platt, J. Fast training of support vector machines using sequential minimal optimization.
In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in
Kernel Methods --- Support Vector Learning, pages 185-208, Cambridge, MA, 1999. MIT Press.
-
S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, and K.R.K. Murthy. Improvements to platt's SMO algorithm for SVM classifier design. Technical report, Dept of CSA, IISc, Bangalore, India, 1999.
-
S.S. Keerthi and E.G. Gilbert, Convergence of a generalized SMO algorithm for SVM classifier design, Technical Report CD-00-01, Dept. of Mechanical and Production Eng., National University of Singapore, 2000.
-
Edgar Osuna, Robert Freund, and Federico Girosi. Support vector machines: Training and applications. Technical Report AIM-1602, 1997.
-
Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M. Jr, and Haussler, D. Knowledge-based analysis of microarray gene expression data by using support vector machines.
Proc. Natl. Acad. Sci. USA 97: 262-267: 2000.
-
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (ECML-98), 1998.
-
Caragea, D., Cook, D., and Honavar, V. (2001). Gaining Insights into Support
Vector Machine Classifiers Using Projection-Based Tour Methods.
In: Proceedings of the Conference on Knowledge Discovery and Data Mining.
San Francisco, CA.
Recommended readings
-
S.S. Keerthi, S. Shevade, C. Bhattacharya and K. Murthy. A fast iterative nearest point algorithm for support vector machine classifier design. (Technical Report TR-ISL-99-03). Dept. Comp. Sci. and Auto., Indian Inst. of Science, Bangalor, India, 1999.
-
Yi Li and Philip M. Long, The Relaxed Online Maximum Margin Algorithm, Machine Learning Vol. 46,
pp. 361-, 2002.
-
Graepel, T., Herbrich, R., & Williamson, R. C. (2001). From margin to sparsity. In Advances in Neural Information System Processing 13.
-
J. Platt, N. Cristianini, J. Shawe-Taylor, Large Margin DAGs for Multiclass Classification, in: Advances in Neural
Information Processing Systems 12, pp. 547-553, MIT Press, (2000).
Additional Information
-
Support-Vector.net
-
Vapnik, V. V. Vapnik (1998)
Statistical Learning Theory, John Wiley, 1998, NY.
-
Cristianini, N. and Shawe-Taylor, J. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.
-
Schölkopf, B. and Smola, A. Learning with Kernels. MIT Press, Cambridge, MA, 2002
Week 13 (April 13, 2003)
Learning Sequence Classifiers. Learning Sequence Classifiers Using Feature-Based Representations of Sequences -- e.g., bag of words representations for text classification. N-grams for text analysis and statistical natural language processing (NLP). Markov models and Hidden Markov Models (HMM). Some special classes of HMM (ergodic, left-to-right, etc.) Fundamental HMM problems -- computing the probability of a given observation sequence given a model; computing the most likely hidden state sequence given an observation sequence and a model; computing the most likely model given observation sequence(s). Forward, Backward, Forward-Backward and Viterbi Algorithms; Algorithms for Learning a HMM from data.
Unsupervised or self-supervised learning. Learning Mixture Models from Data; Identifiability of Mixture Models; Maximum Likelihood approach to Mixture Model Learning -- Expectation Maximization (EM) algorithms. Clustering. Distance measures, Clustering Criteria -- Intra-Cluster and Inter-Cluster distances. Partition-Based Clustering Algorithms. K-means Clustering Algorithm and Variants; Hierarchical Agglomerative Clustering Algorithm.
Required readings
-
Lecture Slides. V. Honavar
-
Lecture Slides. V. Honavar
-
A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, L. R. Rabiner, Proc. IEEE, Vol. 77, No. 2, pp. 257-286, February 1989. Errata.
-
Pereira, Fernando, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words.. Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, pages 183--190.
-
N. Slonim and N. Tishby. Agglomerative information bottleneck.. In Proc. of NIPS-12, pages 617-623. MIT Press, 2000.
-
Slonim, N. and Tishby, N. (2000) Document clustering using word clusters via the information bottleneck method. In Proceedings of the 23rd International Conference on Research and Development in Information Retrieval (SIGIR), pp. 208--215.
-
T. Hofmann, Probabilistic latent semantic indexing, in Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999, pp. 50--57.
-
Pavel Berkhin. A Survey of Clustering Algorithms 2002.
-
Why so many Clustering Algorithms?
-
Pitt, L. and Reinke, R. 1988. Criteria for Polynomial Time (Conceptual) Clustering. Machine Learning. 2:371-396, 1988.
-
Barbara, D. 2002. Requirements for Clustering Data Streams. SIGKDD Explorations. Vol. 3. pp. 23-27.
Recommended Readings
-
A. Krogh, M. Brown, S. Mian, K. Sjolander, and D. Haussler. Hidden markov models in computational biology: Applications to protein modeling. Mol. Biology, 235:1501--1531, 1994.
-
Christina Warrender, Stephanie Forrest, and Barak A. Pearlmutter. Detecting intrusions using system calls: Alternative data models.. In IEEE Symposium on Security and Privacy, pages 133--145, 1999.
-
K. Seymore, A. McCallum, and R. Rosenfeld. Learning hidden markov model structure for information extraction.In: AAAI 99 Workshop on Machine Learning for Information Extraction, 1999.
-
D. J. C. MacKay and L. C. Peto. A hierarchical Dirichlet language model.. Natural Language Engineering, 1(3):1--19, 1995.
-
Escobar, M.D., and West, M. (1995), Bayesian Density Estimation and Inference Using Mixtures Journal of the American Statistical Association, 90, 577-588.
-
Mishra, N., Oblinger, D., and Pitt, L. Sublinear Time Approximate Clustering.. 12th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 439-447, January, 2001.
-
Ramgopal R. Mettu and C. Greg Plaxton. Optimal Time Bounds for Approximate Clustering. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, August 2002, pages 344-351.
-
P. Indyk, 1999. Sublinear Time Algorithms for Metric Space Problems, Symposium on Theory of Computing (STOC '99).
Additional Information
-
Charniak, Eugene (1993). Statistical Language Learning. MIT Press.
-
Clustering Algorithms. Hartigan.
Week 14
Reinforcement Learning. Agents that learn by exploration of
environments, using environmental reward and punishment. Examples of
reinforcement learning problems. Credit assignment problem and its implications. Exploration-exploitation dilemma.
Some approaches to exploration-exploitation tradeoffs. Markov decision processes. Learning optimal policies from
interaction with the environment -- determistic, stochastic, stationary and non-stationary
environments. Value functions and Action-Value functions. Bellman equations and dynamic programming
approach to learning optimal policies when the transition function and reward functions are known (i.e.,
the agent has a model of the environment). Q learning algorithm for learning optimal policies when an
accurate model of the environment is not known. Temporal difference methods. Scaling up reinforcement learning to large
state spaces -- function approximation methods for compact representation of actio-value functions; state abstraction methods
for hierarchical reinforcement learning. Multi-agent reinforcement learning. Applications.
Required readings
- Chapter 13 from the Mitchell Text.
-
Lecture Slides. V. Honavar
-
Harmon, M. and Harmon, S. (1997). Reinforcement
Learning: A Tutorial.
- Kaelbling, L., Littman, M., and Moore, A. (1996). Reinforcement
Learning: A Survey.
-
J. Rennie and A. McCallum. Using Reinforcement Learning to Spider the Web Efficiently. Proceedings of the Sixteenth International Conference on Machine Learning, 1999.
-
M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. . In Proc. of the 15th Int. Conf. on Machine Learning, pages 260--268. Morgan Kaufmann, 1998.
-
Barto, A., and Mahadevan, S. Hierarchical Reinforcement Learning. Draft.
-
Schultz, W, Dayan, P and Montague, PR (1997). A neural substrate of prediction and reward. Science, 275, 1593-1599.
-
Even-Dar, E. and Mansour, Y. (2001). Convergence of Optimistic and Incremental Q- Learning. NIPS 2001.
-
D. Precup, R. S. Sutton, and S. Singh. Theoretical results on reinforcement learning with temporally abstract options.. In Proceedings of the 10th European Conference on Machine Learning, ECML-98, pages 382--393. Springer Verlag, 1998.
Recommended readings
- Sutton, R. and Barto, A. (1998). Reinforcement
Learning: An Introduction Cambridge, MA: MIT Press.
-
Mahadevan, S. Spatio-Temporal Abstraction of Stochastic Sequential Processes.
-
P. Marbach, O. Mihatsch, and J. N. Tsisiklis, Call admission control and routing in integrated services networks using neuro-dynamic programming,. IEEE Trans. Veh. Technol., vol. 18, no. 2, pp. 1972.
-
T. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. JAIR, 13:227-303, 2000.
-
J. Baxter and P. Bartlett. Direct gradient-based reinforcement learning. Technical report, Australian National University, Research School of Information Sciences and Engineering, July 1999.
-
J. Baxter and P. Bartlett. Direct gradient-based reinforcement learning. Technical report, Australian National University, Research School of Information Sciences and Engineering, July 1999.
-
Bartlett and J. Baxter. Estimation and approximation bounds for gradient-based reinforcement learning. Journal of Computer and Systems Sciences, 2002.
-
Mark Humphrys. Action Selection methods using Reinforcement Learning. PhD thesis, University of Cambridge, June 1997.
Week 15
Review, Summary of the Course, and Discussion of Some Current Research Problems
General Background Material of Interest
-
Computing Machinery and Intelligence, Alan Turing.
- AI Overview pa
ge from the American Association for Artificial Intelligence
- AI Applications page from the American Association for Artificial Intelligence
- AI in the News page from the American Association for Artificial Intelligence
- What is AI? (by J. McCarthy)
- History and Promise of AI (by D. Waltz)
- Report on 21 st Century Intelligent Systems (by B. Grosz and R. Davis)
- The Role of Intelligent Systems in the National Information Infrastructure (by D. Weld)
-
Tutorial on Intelligent Agents and Multiagent systems (by V. Honavar)
-
Nwana, H.
Software Agents: An Overview., Knowledge Engineering Review. vol. 11., no. 3., pp. 1-40. 1996.
-
Wooldridge, M. and Jennings, N.
Intelligent Agents: Theory and Practice Knowledge Engineering Review. vol. 10., no. 2., pp. 115-152. 199
5.
-
Jennings, N. R., Sycara, K., and Wooldridge, M.,
A Roadmap of Agent Research and Development Int Journal of Autonomous Agents and Multi-Agent Systems 1 (1) 7-38, 1998.
-
AgentWeb by Tim Finin
-
Additional AI Links by V. Honavar
- AI on the web by Russell and Norvig
If you would like to be alerted by email when this page is updated, please register with Mind it.
Copyright © 1999-2003, Vasant Honavar, Department of Computer Science, Iowa State University. All rights reserved.
Dr. Vasant Honavar
Professor
Department of Computer Science
Iowa State University
Atanasoff Hall, Ames, IA 50011-1040 USA
phone: +1-515-294-4377, fax: +1-515-294-0258