Artificial Intelligence Research Seminar
Artificial Intelligence Research Laboratory
Department of Computer Science
Iowa State University


Artificial Intelligence Research Seminar Com S 610 (VH) Fall 2000 will meet once a week. AI seminar will be coordinated by Adrian Silvescu. The seminar topics for fall 2000 will be drawn from among the following:

MONDAY MEETINGS: 217 Atanasoff Hall, 3:30-5:00pm

OCT. 2: Xiaosi Zhang and Neeraj Koul.

Xiaosi will talk about the data mining the yeast genome expression data. It will be focused on the cluster analysis of gene expression patterns.Using spotted DNA microarrays data, clustering the gene expression data groups together, the coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not avaiable currently.

The avaiable public gene expression data includes the data during the diauxic shift, the mitotic cell division cycle, sporulation, and temperature and reducing shock by using microarrays containing essentially every ORF. The data can be downloaded from:
http://cmgm.stanford.edu/pbrown/explore/index.html
Neeraj will present material from the following papers:

  1. Eisen et. al. Cluster analysis and display of genome-wide expression patterns.
  2. M.B. Eisen, P.T. Spellman. Mining the yeast genome expression and sequence data.

SEPT. 25: Neurobiology Talk on Gene Expression.

SEPT. 18: Research interests presentation

SEPT. 11: Vasant Honavar Algorithmic Approaches to Gene Expression Analysis

Modern biology rests on the premise (often referred to as the central dogma) > that the functional state of an organism is largely determined by the gene expression pattern. This premise implies that understanding the nature of complex biological processes such as development, cellular differentiation, carcinogenesis, etc., requires determining the spatio-temporal expression patterns of thousands of genes, and, more importantly, seeking out the organizing principles that allow biological processes to function in a coherent manner under different environmental conditions. The recent advent of DNA microarray technology provides biologists with the ability to measure the expression levels of thousands of genes in a single experiment. Initial experiments by Eisen et al (1998) using microarray technology suggest that sets of genes with related functions can be detected on the basis of similar gene expression patterns. With the increasing use of DNA microarray and related technologies for gathering gene expression data from plants and animals, there is a growing need for sophisticated computational tools for extracting biologically significant information from gene expression data, assigning functions to genes, and identifying signalling pathways and control circuits (e.g., signal transduction pathways and genetic regulatory networks). In this talk, I will present an overview of algorithmic approaches that have been used for largescale gene expression analysis. I will also point out some of the limitations of currently used approaches. I will conclude with a a proposal for a gene expression analysis toolkit consisting of a suite of algorithms that overcome the limitations of the current techniques.

References

  1. http://linkage.rockefeller.edu/wli/microarray/
  2. Alon, U., Barkai, N, Notterman, D., Gish, K., Ybarra, S., Mack., D., Levine, A. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96:6745-6750.
  3. Ben-Dor, A., Yakhini, Z. (1999). Clustering Gene Expression Patterns. J. Computational Biology 6:281-297.
  4. DeRisi, J., Iyer, V., Brown, P. (1997). Exploring metabolic and genetic control of gene expression on a genomic scale. Science 278:680-686.
  5. Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. PNAS 95:14863-14868.
  6. Wen, X., Fuhrman, S., Michaels, G., Carr, D., Smith, S., Barker, J., Somogyi, R. (1998). Large-scale temporal gene expression mapping of central nervous system development. PNAS 95:334-339.
SEPT. 4: (Re)Organizational Meeting.

WEDNESDAY MEETINGS: 3:30-5:00pm, 217 Atanasoff Hall.

OCT. 11: Doina Caragea

We will finish the discussion about how to learn a boolean function using Fourier representation, and then we will see how this theory can be applied to learn a decision tree. We will also learn how to construct a decision tree given its Fourier representation in practice (how to go from Fourier representation to information gain). The first part of the talk is based on the paper: Learning Boolean Functions via the Fourier Transform, Yishay Mansour, 1994 (http://www.math.tau.ac.il/~mansour/cv.htm) and the second part is based on the paper: Collective Data Mining: A New Perspective Toward Distributed Data Mining. Kargupta, H., Park, B., Hershberger, D. , and Johnson, E.,(1999) (http://www.eecs.wsu.edu/~hillol/).

OCT. 4: Doina Caragea

In the seminar today I will talk about how to learn a boolean function using Fourier Transform. The presentation will be structured as follows: description of the concept learning: introduction to Fourier transform (Fourier basis): the connection between learning and Fourier transform: algorithms for learning a boolean function using Fourier theory; how this algorithms can be applied for learning a boolean decision tree.

Reference:

  1. Learning Boolean Functions via the Fourier Transform, Yishay Mansour, 1994

SEPT. 27: Adrian Silvescu

This talk will present some recent developments in Statistical Learning Theory based on Valdimir Vapnik's book.

SEPT. 20: Carson Andorf.

Recent advances in data storage and data acquisition technologies have made it possible to produce large data sets. Many of these large data sets are physically distributed and due to their large size it is very expensive, in terms of both network bandwidth and time, to assemble them at a central location. Other data sets have security issues so only summaries can be made available. These types of data sets require algorithms that learn from distributed data without actually collecting the data. Currently, there are a lot of batch learning algorithms and many of these can be mapped into a distributed learning environment.

In this talk, I will discuss learning decision trees on distributed data sets. I will give a brief overview of different methods of learning from distributed data sets, different types of distributed data, and how decision trees work in a batch environment. Most of my talk will focus on the work of Taru Sharma in mapping the Decision Tree algorithm ID3 (Quinlan, 1986) to an environment that deals with both vertically and horizontally distributed data and also, my own work, in collaboration with Dr. Honavar, in mapping the algorithm IREP and IREP* (Furnkranz and Widmer 1994) to an environment of both vertically and horizontally distributed data.

References

  1. Cohen, William W. Fast Effective Rule Induction. Proceedings of the Twelfth International Conference on Machine Learning 1995.
  2. Furnkranz, J. and Widmer, G. Incremental Reduced Error Pruning. Proceedings of the Eleventh Annual Conference on Machine Learning. 1994.
  3. Quinlan, J. R. Induction of Decision Trees Machine Learning 1:81-106, 1986.
  4. Sharma, Tarkeshwari. Agent Toolkit for distributed knowledge networks. M.S. Thesis, Computer Science, Iowa State University.
  5. Sharma, T., Silvescu, T., and Honavar, V. (2000). LearningClassification Trees from Distributed Horizontally and Vertically Fragmented Data Sets. Under review.

SEPT. 13: Statistical Learning Theory: Adrian Silvescu

This talk will introduce some recent results in statistical learning theory developed by Valdimir Vapnik in his recent books on this topic.

SEPT. 6: Vasant Honavar

Cumulative Multi-Task Learning from Distributed, Dynamic Data and Knowledge Sources

Abstract

A fundamental question in computational studies of learning is: How do living systems learn over a period of time, across multiple tasks, without losing the ability to perform the tasks that they have already mastered? A closely related problem involves the design and analysis of algorithms that enable autonomous agents to learn from multiple, distributed, dynamic data and knowledge sources as well as other agents in open-ended environments. While there has been a great deal of research on batch learning algorithms that learn from a given data set, there is relatively little work on algorithms for learning in open-ended environments.

In this talk, I will introduce a class of learning problems that arise in open-ended, dynamic environments consisting of multiple, distributed, possibly autonomous data and knowledge sources and agents and review some of the work that is being done in our lab on addressing these problems.

Much of this talk is based on work that has been done in collaboration with Doina Caragea, Adrian Silvescu, and Carson Andorf all of whom are graduate students in the AI lab.

References

  1. Caragea, D., Silvescu, A., and Honavar, V. (2000). Agents That Learn from Distributed Dynamic Data Sources. In: Proceedings of the ECML 2000/Agents 2000 Workshop on Learning Agents. Barcelona, Spain.
  2. Caragea, D., Silvescu, A., and Honavar, V. (2000). Towards a Theoretical Framework for Analysis and Synthesis of Distributed and Incremental Learning Agents. In: Proceedings of the Workshop on Distributed and Parallel Knowledge Discovery. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, U.S.A.
  3. Polikar, R., Udpa, L., Udpa, S., and Honavar, V. (2000). Learn++: An Incremental Learning Algorithm for Multilayer Perceptron Networks. In: Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2000. Istanbul, Turkey.
  4. Sharma, T., Silvescu, T., and Honavar, V. (2000). Learning Classification Trees from Distributed Horizontally and Vertically Fragmented Data Sets. Under review.
  5. Honavar, V., Miller, L. and Wong, J. (1998). Distributed Knowledge Networks. In: Proceedings of the IEEE Information Technology Conference. Syracuse, NY.
  6. Miller, L., Honavar, V. and Wong, J. (1998). Object-Oriented Data Warehouse for Information Fusion from Heterogeneous Data and Knowledge Sources. In: Proceedings of the IEEE Information Technology Conference. Syracuse, NY.

If you are interested in receiving seminar announcements, please send email to honavar@cs.iastate.edu to get on our mailing list or periodically check out this page for schedule of talks.


You can check out schedules of some of the past seminars here.

For additions and updates to this page, please contact: silvescu@cs.iastate.edu.

If you want to be informed when this page is updated, please enter your email here:


Artificial Intelligence Research Laboratory
Department of Computer Science
Iowa State University
Atanasoff Hall, Ames, IA 50011-1040 USA
phone: +1-515-294-4377, fax: +1-515-294-0258