Principles of Artificial Intelligence: Study Guide
|
The material to be covered each week and the assigned readings (along with online lecture notes, if available)
are included on this page. The links to lecture notes will be updated each week. The assigned readings
are divided into required and recommended readings and notes from recitations (if available). You will be responsible
for the material covered in the lectures and the assigned required readings. You are strongly encouraged to explore the
recommended readings.
Week 1 (starting August 20, 2007)
Overview of the course; Overview of artificial intelligence: What
is intelligence? What is artificial intelligence (AI)? History of AI;
Working hypothesis of AI. Introduction to intelligent
agents. Intelligent agents defined. Taxonomy of agents. Simple reflex
agents (memoryless agents); agents with limited memory; rational
agents; agents with goals; utility-driven Agents.
Required Readings -- Artificial Intelligence
Required Readings -- Programming
You may skip most of these readings if you have prior programming experience in Java.
Recommended Materials
Recitation notes
Week 2 (beginning August 27, 2007)
Goal-Based Agents. Problem-solving as state space search. Formulation of state-space search
problems. Representing states and actions. Basic search algorithms and their properties: completeness,
optimality, space and time complexity. Breadth-first search,
depth-first search, backtracking search, depth-limited and interative
deepening search.
Heuristic search. Finding optimal solutions. Best
first search. A* Search: Adding Heuristics to Branch and Bound Search.
Completeness, Admissibility, and Optimality of the A*
algorithm. Design of admissible heuristic functions. Comparison of
heuristic functions ("informedness" of heuristics).
Required readings
Recommended readings
- Zhang, W. and Korf, R. (1995). Performance of Linear Space Search Algorithms Artificial Intelligence, Vol. 79, pp. 249-292.
-
Korf, R. (2000). Recent Progress in Heuristic Search., Proceedings of AAAI, 2000.
-
Jensen, R.M., Bryant, R.E., and Veloso, M. (2002) SetA*: An Efficient BDD-Based Heuristic Search Algorithm, In: Proceedings of AAAI, 2002.
-
Zhou, R. and Hansen, E.A. (2003). Sweep A*: space-efficient heuristic search in partially ordered graphs, In: Proceedings of the IEEE Conference on Tools with Artificial Intelligence
-
Edelkamp, S., Jabbar, S., and Lluch-Lafuente, A. (2005)
Cost-Algebraic Heuristic Search In: Proceedings of the AAAI, 2005.
-
Mandow L. and Perez de la Cruz J.L. (2003) Multi-Criteria Heuristic Search, European Journal of Operational Research, Volume 150, Number 2, 16 October 2003, pp. 253-280(28)
-
Furcy, D.A. (2004) Speeding Up the Convergence of Online Heuristic Search and Scaling Up Offline Heuristic Search, Ph.D. Thesis, College of Computing, Georgia Tech.
-
Korf, R., Zhang, W., Thayer, I. and Heath, H. (2005). Frontier Search, Journal of the ACM, Vol. 52, pp. 715-748.
- G. Polya, How to Solve It, Princeton University Press, 1957.
- Pearl, J. (1988). Heuristics: Intelligent Search Strategies for
Computer Problem Solving.
- Reid, M. and Korf, R. (1998). Complexity analysis of admissible heuristic search, Proceedings of AAAI, 1998,
pp. 305-310.
Additional Information
- AI
Topics page from the American Association for Artificial
Intelligence.
Week 3 (beginning September 3, 2007)
Problem Solving through Problem Reduction. Searching AND-OR graphs.
A*-like admissible algorithm for searching AND-OR graphs.
Greedy hill-climbing search. Metropolis algorithm and Simulated
annealing, genetic algorithm.
Required readings
- Section 4.3 and Chapter 5 from Artificial Intelligence: A Modern Approach, Russell and Norvig.
- Lecture Slides (problem reduction representation), Vasant Honavar
- Lecture Slides (hillclimbing, simulated annealing, genetic algorithms), Vasant Honavar
- Lecture Notes: Informed Search,
by Vasant Honavar.
-
Optimization by Simulated Annealing, Kirkpatrick, Gelatt, and Vecchi, Nature, 1983.
-
A Tutorial on Genetic Algorithms, Darrel Whitley.
Recommended readings
Additional Information
- AI
Topics page from the American Association for Artificial
Intelligence.
Week 4 (beginning September 10, 2007)
Problem solving as
Constraint Satisfaction. Properties of constraint satisfaction
problems. Examples of constraint satisfaction problems.
Iterative
instantiation method for solving CSPs. Scene interpretation as
constraint propagation (Waltz's line labeling algorithm).
Node consistency, arc consistency, and related algorithms.
Required readings
Recommended readings
-
Constraint Satisfaction Tutorial, by Edward Tsang.
-
Online Constraint Programming Tutorial, by Roman Bartak
-
Chapters 1 and 2 from Foundations of Constraint Satisfaction, by Edward Tsang. (chapters available for download).
-
A survey of Constraint Sastisfaction Algorithms, Vipin Kumar, 1992.
Theory
and Practice of Constraint Propagation, by Bartak,
R. (2001).
- Waltz, D. Understanding line drawings of scenes with shadows. In: Winston, P.H. (Ed). The Psychology of Computer
Vision. McGraw-Hill, NY. 1975.
Additional Information
- AI
Topics page from the American Association for Artificial
Intelligence.
Week 5 (beginning September 17, 2007)
Introduction to Knowledge Representation.
Logical Agents with explicit knowledge representation.
Knowledge representation using propositional logic; Review of
Propositional Logic: Propositional logic as a knowledge
representation language: Syntax and Semantics; Possible worlds
interpretation; Models and Logical notions of Truth and Falsehood;
Logical Entailment; Inference rules; Modus ponens; Soundness and
Completeness properties of inference. Modus Ponens is a sound
inference rule for Propositional logic, but is not complete. Extending
modus ponens - the resolution principle.
Logical Agents without explicit representation.
Comparison of logical agents with and without explicit representations.
FOPL (First-Order Predicate Logic). Ontological and epistemological commitments
and Syntax and semantics of FOPL.
Examples. Theorem-proving in FOL. Unification, instantiation, and entailment.
Required readings
Recommended readings
- Concepts
of Logical AI, by John McCarthy.
-
DPLL Algorithm
-
A New Method for Solving Hard Satisfiability Problems,
Bart Selman, Hector Levesque, David Mitchell Proceedings of AAAI, 1992.
-
Hard and Easy Distributions of SAT Problems
David Mitchell, Bart Selman, Hector Levesque,
Proceedings of AAAI, 1992.
-
Using CSP Look-Back Techniques to Solve Real-World SAT Instances< Bayardo, R. and Schrag, R. In: Proceedings of AAAI, 1997.
- Vehicles: Experiments in Synthetic Psychology, Braitenberg, V. MIT Press. 1986.
-
The Synthesis of Digital Machines With Provable Epistemic Properties.
S. Rosenschein and L. Kaelbling (1986). In Proceedings of the Conference on Theoretical Aspects of Knowledge Representation (TARK)
Additional Information
- AI
Topics page from the American Association for Artificial
Intelligence.
- Genesereth, M. R., & Nilsson, N. J., Logical Foundations
of Artificial Intelligence. Palo Alto, CA: Morgan Kaufmann
(1987).
Week 6 (beginning September 24, 2007)
Transformation of FOPL sentences in Clause Normal Form. Resolution by
refutation for First Order Predicate Logic. Examples. Automated
Theorem Proving. Search Control Strategies for
Theorem Proving. Unit Preference, Set of Support and related
approaches. Soundness and Completeness of Proof Procedures.
Semidecidability of FOPL and its implications. Brief discussion of Datalog (for deductive databases)
and Prolog (for logic programming).
Emerging Applications of Knowledge Representation.. Semantics-Driven Applications. Ontologies. Information Integration. Service Oriented Computing. Semantic Web. Brief overview of Ontology Languages: RDF, OWL. Description Logics - Syntax, Semantics, and Inference.
Required readings
Recommended Readings
Additional Information
- Knowledge Representation - Logical, Philosophical, and Computational Foundations, by John Sowa.
- Problems for Automated Theorem Provers, Geoff Sutcliffe
- Genesereth, M. R., & Nilsson, N. J., Logical Foundations
of Artificial Intelligence. Palo Alto, CA: Morgan Kaufmann
(1987).
- Davis, E. Representations of Commonsense
Knowledge. Palo Alto, CA: Morgan Kaufmann (1990).
- Glossary
of First Order Logic, P. Suber.
-
Duffy, D. Principles of Automated Theorem Proving, New York: John
Wiley and Sons. (1991).
-
Wos, L., Overbeek, R., Lusk, E., & Boyle, J., Automated Reasoning. New York, NY: McGraw-Hill (1992).
-
Logic-Based Knowledge Representation Systems and Theorem Provers:
Week 7 (beginning October 1, 2007)
Representing and Reasoning Under Uncertainty.
Review of elements of probability. Probability spaces. Bayesian
(subjective) view of probability. Probabilities as measures of belief
conditioned on the agent's knowledge. Axioms of
probability. Conditional probability. Bayes theorem. Random Variables.
Independence. Probability Theory as a generalization of propositional logic.
Syntax and Semantics of a Knowledge Representation based on probability theory.
Sound inference procedure for probabilistic reasoning.
Independence and Conditional Independence. Exploiting independence relations for compact representation of probability distributions.
Introduction to Bayesian
Networks. Semantics of Bayesian Networks. D-separation. D-separation examples. Answering Independence Queries
Using D-Separation tests.
Probabilistic Inference Using Bayesian Networks.
Exact Inference Algorithms - Variable Elimination Algorithm; Message Passing Algorithm;
Junction Tree Algorithm. Complexity of Exact Bayesian Network Inference.
Approximate inference using stochastic simulation (sampling, rejection sampling, and liklihood weighted sampling
Required readings
Recommended Readings
Additional Information
- Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference. Judea Pearl (1997).
-
Castillo, E., Gutierrez, J. M., Hadi, A. S. Expert Systems and Probabilistic Network Models. Berlin: Springer (1996).
-
Cowell, R. G. Lauritzen, S. L., and Spiegelhalter, D. J. Probabilistic Networks and Expert Systems Berlin: Springer (2005).
-
Korb, K.B., and Nicholson, A.E., Bayesian Artificial Intelligence, Chapman and Hall (2004).
Week 8 (beginning October 8, 2007)
Making Simple Decisions under uncertainty, Elements of utility theory, Constraints on rational preferences, Utility functions, Utility elicitation, Multi-attribute utility functions, utility independence, decision networks, value of information
Mid term examination
Required readings
- Chapter 16 from the Artificial Intelligence: A Modern
Approach textbook by Russell and Norvig.
- Lecture Slides
Recommended Readings
Additional Information
-
French, S. Decision Theory: An Introduction to the Mathematics of Rationality. Ellis Horwood, 1986.
-
Luce, R. D., and Raiffa, H. (1957). Games and decisions. New York: John Wiley and Sons. Dover republication 1989.
-
Korb, K.B., and Nicholson, A.E., Bayesian Artificial Intelligence, Chapman and Hall (2004).
-
Ralph L. Keeney and Howard Raiffa, Decisions with Multiple Objectives:
Preferences and Value Trade Offs, Wiley, New York, 1993.
Week 9 (beginning October 15, 2007)
Sequential Decision Problems. Markov Decision Processes. Value Iteration. Policy Iteration. Partially Observable MDPs.
Required Readings
- Chapter 17 from the Artificial Intelligence: A Modern Approach Textbook by Russell and Norvig.
Lecture Slides
Recommended Readings
Additional Information
-
Bertsekas and Tsitsiklis, Neurodynamic Programming. Athena Scientific; 1 edition (May 1996)
-
Puterman, M. (2005). Markov Decision Processes: Discrete Stochastic Dynamic Programming. New york: Wiley.
Week 10 (beginning October 22, 2007)
Markov Decision Processes and Sequential Decision Problem.
Reinforcement Learning. Agents that learn by exploring and interacting with environments. Examples of reinforcement learning scenario. Markov decision processes. Types of environments (e.g., deterministic versus stochastic state transition functions and reward functions, stationary versus non-stationary environments, etc.).
The credit assignment problem. The exploration vs. exploitation dilemma. Value Iteration algorithm. Policy Iteration algorithm. Q-learning Algorithm, Confergence of Q-learning. Temporal Difference Learning Algorithms.
Required readings
- Chapter 17 (Sections 17.1 and 17.2, 17.3, 17.4). Chapter 21 (Sections 21.1, 21.2, 21.3) from the Artificial Intelligence: A Modern Approach textbook by Russell and Norvig.
-
Lecture Slides
- Lecture Notes (partial), Vasant Honavar.
-
Harmon, M. and Harmon, S. (1997). Reinforcement
Learning: A Tutorial.
- Kaelbling, L., Littman, M., and Moore, A. (1996). Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research.
-
J. Rennie and A. McCallum. Using Reinforcement Learning to Spider the Web Efficiently. Proceedings of the Sixteenth International Conference on Machine Learning, 1999.
Recommended readings
-
Singh, S., Littman, M., Jong, N.K., Pardo, D., and Stone, P. (2003). Learning predictive state representations. In: Proceedings of the International Conference on Machine Learning (ICML 2003).
-
Singh, S., Barto, A., and Chentanez, N. (2005). Intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems (NIPS 2005).
-
Barto, A., and Mahadevan, S. Hierarchical Reinforcement Learning. Draft.
-
Schultz, W, Dayan, P and Montague, PR (1997). A neural substrate of prediction and reward. Science, 275, 1593-1599.
Additional Information
- Sutton, R. and Barto, A. (1998). Reinforcement
Learning: An Introduction Cambridge, MA: MIT Press.
-
Mahadevan, S. Spatio-Temporal Abstraction of Stochastic Sequential Processes.
-
T. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. JAIR, 13:227-303, 2000.
-
Mark Humphrys. Action Selection methods using Reinforcement Learning. PhD thesis, University of Cambridge, June 1997.
-
Even-Dar, E. and Mansour, Y. (2001). Convergence of Optimistic and Incremental Q- Learning. NIPS 2001.
Week 11 (beginning October 29, 2007
Overview of machine learning. Why should machines learn?
Operational definition of learning.
Bayesian Decision Theory. Optimal Bayes Classifier. Minimum Risk Bayes Classifier.
Introduction to Generative Models. Naive Bayes Classifier. Maximum Likelihood Probability Estimation. Properties of Maximum Likelihood Estimators. Limitations of Maximum Likelihood Estimators. Bayesian Estimation. Conjugate Priors. Detailed treatment of Bayesian estimation in the multinomial case using Dirichlet priors. Maximum A posteriori Estimation. Representative applications of Naive Bayes classifiers.
Required readings
-
Does Machine Learning Really Work?, Tom Mitchell.
- Chapter 18 (Sections 18.1 and 18.2), Chapter 20 (Sections 20.1 and 20.2 pages 712-719), Chapter 18 (Section 18.3) from the Artificial Intelligence: A Modern Approach textbook by Russell and Norvig.
-
Lecture Slides
Recommended Readings
-
McCallum, A. and Nigam, K. A Comparison of Event Models for Naive Bayes Text Classification.. In AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48. Technical Report WS-98-05. AAAI Press. 1998.
-
D. D. Lewis. Naive Bayes at forty: The independence assumption in information retrieval. In ECML-98: Proceedings of the Tenth European Conference on Machine Learning, pages 4--15, Chemnitz, Germany, April 1998. Springer.
-
Zhang, J. and Honavar, V. (2004). Extended version of AVT-NBL: An Algorithm for Learning Compact and Accurate Naive Bayes Classifiers from Attribute Value Taxonomies and Data In: Proceedings of the IEEE International Conference on Data Mining, Brighton, UK. To appear in Journal of Knowledge and Information Systems.
-
Kang, D-K., Silvescu, A. and Honavar, V. (2007). RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification. In: Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2007). Lecture Notes in Computer Science.. Berlin: Springer-Verlag.
-
A Mathematical Theory of Communication, C. Shannon.
Additional Information
-
Hastie, T., Tibshirani, R., & Friedman, J. The elements of Statistical Learning - Data Mining, Inference, and Prediction. Berlin: Springer-Verlag. (2001).
-
Mitchell, T. Machine Learning. New York: Mc Graw-Hill. (1997).
Week 12 (beginning November 5, 2007)
Modeling dependence between attributes. The decision tree classifier. Introduction to information theory. Information, entropy, mutual information, and related concepts (Kullback-Liebler divergence).
Algorithm for learning decision tree classifiers from data.
The relationship between MAP hypothesis learning, minimum description length principle (Occam's razor) and the role of priors.
Ovrfitting and methods to avoid overfitting -- dealing with small sample sizes; prepruning and post-pruning. Pitfalls of entropy as a splitting criterion for multi-valued splits. Alternative splitting strategies -- two-way versus multi-way splits; Alternative split criteria: Gini impurity, Entropy, etc. Cost-sensitive decision tree induction -- incorporating attribute measurement costs and misclassification costs into decision tree induction.
Dealing with categorical, numeric, and ordinal attributes. Dealing with missing attribute values during tree induction and instance classification.
Evaluation of classifiers. Accuracy, Precision, Recall, Correlation Coefficient, ROC curves.
Required Readings
- Chapter 18 (Sections 18.3), from the Artificial Intelligence: A Modern Approach textbook by Russell and Norvig.
- Decision Trees, Nils Nilsson.
- Friedman, J. H. (1977). A recursive partitioning decision rule for nonparametric classification.
- Decision Tree Tutorial, by H. Hamilton, E. Gurak, L. Findlater, and W. Olive
- Vasant Honavar, Lecture Slides.
-
- Vasant Honavar, Lecture Slides.
- Baldi, P., Brunak, S., Chauvin, Y. and Nielsen, H. (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics Vol. 16. pp. 412-424.
Recommended Readings
-
Caragea, D., Silvescu, A., and Honavar, V. (2004). A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems.
Vol. 1. pp. 80-89.
-
Zhang, J. and Honavar, V. (2003). Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data. In: Proceedings of the International Conference on Machine Learning (ICML-03). Washington, DC. pp. 880-887.
-
Fayyad, U. and Irani, K.B. (1992). On the handling of continuous valued attributes in decision tree generation. Machine Learning vol. 8. pp. 87-102.
-
Wang, H. and Zaniolo, C. (2000).
CMP: A Fast Decision Tree Classifier Using Multi-variate Predictions In: Proceedings of the International Conference on Data Engineering.
-
Domingos, P. (1999). The Role of Occam's Razor in Knowledge Discovery.
Vol. 3, no. 4., pp. 409-425.
- A Mathematical Theory of Communication, C. Shannon.
-
Schmid, H. Probabilistic Part-of-Speech Tagging Using Decision Trees.In: Proceedings of the Conference on New Methods in Language Processing, 1994
-
Silvescu, A., and Honavar, V. (2001). Temporal Boolean Network Models of Genetic Networks and Their Inference from Gene Expression Time Series. Complex Systems.. Vol. 13. No. 1. pp. 54-.
-
Wang, X., Schroeder, D., Dobbs, D., and Honavar, V. (2003). Automated Data-Driven Discovery of Motif-Based Protein Function Classifiers. Information Sciences. Vol. 155. pp. 1-18. 2003.
-
Codrington, C. W. and Brodley, C. E., On the Qualitative Behavior of Impurity-Based Splitting Rules: The Minima-Free Property. Tech. Rep. 97-05. Dept. of Computer Science. Cornell University.
-
Atramentov, A., Leiva, H., and Honavar, V. (2003).
A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments.. In: Proceedings of the Thirteenth International Conference on Inductive Logic Programming. Berlin: Springer-Verlag. Lecture Notes in Computer Science. Vol. 2835, pp. 38-56.
Week 13 (beginning November 12, 2007)
Introduction to Artificial Neural Networks and Linear Discriminant Functions. Threshold logic unit (perceptron) and the associated hypothesis space. Connection with Logic and Geometry. Weight space and pattern space representations of perceptrons. Linear separability and related concepts. Perceptron Learning algorithm and its variants. Convergence properties of perceptron algorithm. Winner-Take-All Networks.
Bayesian Recipe for function approximation and Least Mean Squared (LMS) Error Criterion. Introduction to neural networks as trainable function approximators. Function approximation from examples. Minimization of Error Functions.
Derivation of a Learning Rule for Minimizing Mean Squared Error Function for a Simple Linear Neuron.
Momentum modification for speeding up learning.
Introduction to neural networks for nonlinear function approximation.
Nonlinear function approximation using multi-layer neural networks.
Universal function approximation theorem. Derivation of the generalized delta rule (GDR) (the backpropagation learning algorithm).
Generalized delta rule (backpropagation algorithm) in practice - avoiding overfitting, choosing neuron activation functions, choosing learning rate, choosing initial weights, speeding up learning, improving generalization, circumventing local minima, using domain-specific constraints (e.g., translation invariance in visual pattern recognition), exploiting hints, using neural networks for function approximation and pattern classification. Relationship between neural networks and Bayesian pattern classification. Variations -- Radial basis function networks. Learning non linear functions by searching the space of network topologies as well as weights.
Lazy Learning Algorithms. Instance based Learning, K-nearest neighbor classifiers, distance functions, locally weighted regression. Relative advantages and disadvantages of lazy learning and eager
learning.
Required readings
- Chapter 20, sections 20.4, 20.5 from the Artificial Intelligence: A Modern Approach textbook by Russell and Norvig.
- Lecture Slides. (perceptrons). Vasant Honavar.
- Nils Nilsson. Neural Networks.
-
Honavar, V.
Threshold Logic Units.
-
Honavar, V.
Perceptron Learning Algorithm.
-
Lecture Slides.. Vasant Honavar
-
Honavar, V. Function Approximation from Examples.
-
Honavar, V. Multi-layer networks.
-
Honavar, V. Radial Basis Function Networks.
-
C. G. Atkeson, S. A. Schaal and Andrew W, Moore, Locally Weighted Learning, AI Review,Volume 11, Pages 11-73 (Kluwer
Publishers) 1997
Recommended readings
-
Adam J. Grove, Nick Littlestone, and Dale Schuurmans. General convergence results for linear discriminant updates. In COLT-97, pages 171--183, 1997.
-
The weighted Majority Algorithm, Littlestone, N., and Warmuth, M. Information and Computation Vol. 108: 212-261. 1994.
-
T. G. Dietterich, H. Hild, and G. Bakiri.
A comparative study of ID3 and backpropagation for English text-to-speech mapping. In Proceedings of 7th IMLW, Austin, 1990.
Morgan Kaufmann.
-
Y. Le Cun, B. Boser, J.S. Denker, D. Henderson, R. Howard, W. Hubbard,
and L. Jackel.Handwritten digit recognition with a backpropagation neural network. In D.
Touretzky, editor, Advances in Neural Information Processing Systems 2, pages 396--404. Morgan
Kaufmann, San Mateo, CA, 1990.
-
Thrun and T.M. Mitchell. Learning one more thing. Technical Report CMU-CS-94-184, Carnegie Mellon University, Pittsburgh, PA 15213, 1994.
-
J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121--167, 1998.
Additional Information
-
Bishop, C. 2006. Machine Learning and Pattern Recognition. Cambridge University Press.
-
Mitchell, T. 1998. Machine Learning.
-
Cristianini, N. and Shawe-Taylor, J. (2000). Support Vector Machines. London: Cambridge University Press.
Thanksgiving Break
|