 |
Artificial Intelligence Research Laboratory
Department of Computer Science
Iowa State University
|
Incremental Learning From Distributed, Dynamic Data Sources
Personnel
Project Summary
Funding
Publications
Additional Information
Projects
AI Lab
Personnel
Project Summary
Translating recent advances in high throughput data acquisition and storage technologies and networks into
fundamental gains in understanding of respective domains (e.g., in biological
sciences, organizational decision support) call for the development of
powerful new tools for knowledge acquisition.
For example, by examining data gathered by sensors located at different
network hosts (e.g., system logs that contain records of various system
calls) and known cases of coordinated attacks on the network,
a knowledge acquisition agent can infer useful, a-priori
unknown predictive relationships that can be subsequently employed for
predicting, detecting, and counteracting intrusions. Similarly,
bioinformatics knowledge discovery agents can learn regularities that
characterize molecular structure-function relationships.
The acquired knowledge,
in addition to being of immediate value to the users, would also be used
by software agents to hypothesize likely
events based on information available and then seek out additional data
to support or refute the hypothesis (e.g., in the context of data-driven
scientific discovery).
Machine learning is currently perhaps the most practical approach to
automated or semi-automated data-driven knowledge acquisition and theory
refinement. However, most algorithms available today require that the
entire dataset be available for processing at a single location before
knowledge acquisition can begin.
However, many data sources of interest are large and dynamic.
Thus it is desirable to use mobile software agents that transport themselves to the data repositories, or stationary software agents that reside at the repositories, to perform as much analysis as possible where the data are located, and return only
the results of analysis in order to conserve network bandwidth.
Efficient and scalable approaches to data-driven knowledge acquisition from
distributed, dynamic data sources call for algorithms that
can modify knowledge structures (e.g., pattern classifiers) in an incremental
fashion without having to revisit previously processed data (examples). We have recently developed several formulations of the problem of learning from distributed and dynamic data sources under different assumptions regarding the data
sources and the properties of the learning algorithms.
The proposed
research builds on this work to investigate several approaches to
incremental learning from dynamic, distributed data sources as well as
cumulative, multi-task learning in open-ended environments. These include:
-
Design of knowledge representations that lend themselves to incremental update
of knowledge structures using only the new data and the design of efficient
update algorithms; The algorithm that we have designed for incremental induction of support vector machines provides an example of this approach.
-
Design of online learning algorithms based on Stochastic Approximations. Our algorithm for incremental acquisition of spatial maps using Kalman Filters provides an example of this approach.
-
Design of serial or parallel aggregation schemes for effectively combining multiple hypotheses learned using small subsets of data using techniques based on various weighting and voting schemes akin to boosting and bagging methods. We have conducted some preliminary experiments along these lines
with encouraging results.
-
Design of hybrid algorithms which synergistically
combine statistical summaries of data, identification and memorization
of informative instances, and incremental knowledge structure update.
Anticipated products of this research include new
algorithmic and systems solutions for large-scale automated knowledge
acquisition from large, dynamic, distributed data sources. The resulting
software tools are likely to find use in a variety of applications including
intrusion detection in computer systems) and
and data-driven knowledge discovery in
bioinformatics.
This research is closely integrated with the education and training of graduate
and undergraduate students in Computer Science and Bioinformatics at Iowa State University.
Funding
-
Distributed Knowledge Networks, John Deere Foundation, 1999-2000. Vasant Honavar. $30,000.
-
An Agent-Based Environment for Integrating and Analysing Plant Genome Databases. Pioneer Hi-Bred International, Inc. 2000-2001. Vasant Honavar and Drena Dobbs. $40,000.
-
IGERT: Computational Biology Training Program. National Science Foundation (1999-2004). Daniel Voytas, Susan Carpenter, Vasant Honavar, Patrick Schnable, Jonathan Wendel. $2,374,597 (plus $1,161,010 in matching funds).
-
Graduate Research Fellowships, Iowa State University Graduate College.
-
Constructive Neural
Network Learning Algorithms for Pattern Classification,
National Science Foundation,
(1994-1999). Vasant Honavar. $111,537 (plus $10,000 in matching funds).
Publications
-
Balakrishnan, K., Bousquet, O. and Honavar, V. (2000).
Spatial Learning and Localization in Animals: A Computational Model and Its
Implications for Mobile Robots, Adaptive Behavior. In press.
-
Caragea, D., Silvescu, A., and Honavar, V. (2000). Agents That Learn from Distributed Dynamic Data Sources. In: Proceedings of the ECML 2000/Agents 2000 Workshop on Learning Agents. Barcelona, Spain.
-
Caragea, D., Silvescu, A., and Honavar, V. (2000). Towards a Theoretical Framework for Analysis and Synthesis of Distributed and Incremental Learning Agents. In: Proceedings of the Workshop on Distributed and Parallel Knowledge Discovery. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, U.S.A.
-
Polikar, R., Udpa, L., Udpa, S., and Honavar, V. (2000).
Learn++: An Incremental Learning Algorithm for Multilayer Perceptron Networks.
In: Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2000. Istanbul, Turkey.
-
Sharma, T., Silvescu, A., Andorf, C., Caragea, D., and Honavar, V. (2000). Agents that Learn Classification Trees from Distributed, Horizontally and Vertically Fragmented Data Sources. Technical Report 00-10. Department of Computer Science. Iowa State University.
-
Honavar, V. (1999). Distributed Knowledge Networks. Invited Talk.
Artificial Intelligence for Distributed Information Networks
(AiDIN '99) Workshop held during the 1999 National Confere
nce on Artificial Intelligence (AAAI 99), Orlando, Florida. July 1999.
-
Honavar, V., Miller, L. and Wong, J. (1998).
Distributed Knowledge Networks. In:
Proceedings of the IEEE Information Technology Conference. Syracuse, NY.
Additional Information
To appear.
Dr. Vasant Honavar
Artificial Intelligence Research Laboratory
Department of Computer Science
Iowa State University
Atanasoff Hall, Ames, IA 50011-1040 USA
phone: +1-515-294-1098, +1-515-294-4377; fax: +1-515-294-0258
© Vasant Honavar, 1999.