Artificial Intelligence Research Laboratory
Department of Computer Science
Iowa State University


Incremental Learning From Distributed, Dynamic Data Sources
   Personnel   Project Summary   Funding   Publications   Additional Information   Projects   AI Lab  

Personnel

Project Summary

Translating recent advances in high throughput data acquisition and storage technologies and networks into fundamental gains in understanding of respective domains (e.g., in biological sciences, organizational decision support) call for the development of powerful new tools for knowledge acquisition. For example, by examining data gathered by sensors located at different network hosts (e.g., system logs that contain records of various system calls) and known cases of coordinated attacks on the network, a knowledge acquisition agent can infer useful, a-priori unknown predictive relationships that can be subsequently employed for predicting, detecting, and counteracting intrusions. Similarly, bioinformatics knowledge discovery agents can learn regularities that characterize molecular structure-function relationships. The acquired knowledge, in addition to being of immediate value to the users, would also be used by software agents to hypothesize likely events based on information available and then seek out additional data to support or refute the hypothesis (e.g., in the context of data-driven scientific discovery).

Machine learning is currently perhaps the most practical approach to automated or semi-automated data-driven knowledge acquisition and theory refinement. However, most algorithms available today require that the entire dataset be available for processing at a single location before knowledge acquisition can begin.

However, many data sources of interest are large and dynamic. Thus it is desirable to use mobile software agents that transport themselves to the data repositories, or stationary software agents that reside at the repositories, to perform as much analysis as possible where the data are located, and return only the results of analysis in order to conserve network bandwidth.

Efficient and scalable approaches to data-driven knowledge acquisition from distributed, dynamic data sources call for algorithms that can modify knowledge structures (e.g., pattern classifiers) in an incremental fashion without having to revisit previously processed data (examples). We have recently developed several formulations of the problem of learning from distributed and dynamic data sources under different assumptions regarding the data sources and the properties of the learning algorithms.

The proposed research builds on this work to investigate several approaches to incremental learning from dynamic, distributed data sources as well as cumulative, multi-task learning in open-ended environments. These include:

Anticipated products of this research include new algorithmic and systems solutions for large-scale automated knowledge acquisition from large, dynamic, distributed data sources. The resulting software tools are likely to find use in a variety of applications including intrusion detection in computer systems) and and data-driven knowledge discovery in bioinformatics.

This research is closely integrated with the education and training of graduate and undergraduate students in Computer Science and Bioinformatics at Iowa State University.

Funding

Publications

Additional Information

To appear.


Dr. Vasant Honavar
Artificial Intelligence Research Laboratory
Department of Computer Science
Iowa State University
Atanasoff Hall, Ames, IA 50011-1040 USA
phone: +1-515-294-1098, +1-515-294-4377; fax: +1-515-294-0258

© Vasant Honavar, 1999.