Projects in Computational Biology and Bioinformatics


Tools for Data-Driven Biological Knowledge Discovery

Biological data (e.g., genome data, protein data, ecological data, botanical data, zoological data) is being accumulated and stored in digital form at astronomical rates. There is a growing need for intelligent data analysis tools for automated knowledge acquisition and discovery from such data sources. Our current research is aimed at the design, implementation, adaptation, and application of a broad range of machine learning tools for data analysis and knowledge discovery in biological and agricultural sciences. This work builds on recent work on a broad range of data mining and knowledge discovery algorithms (including those drawn from artificial intelligence, statistical pattern recognition, neural networks, and evolutionary computing) by Honavar and his students in the Artificial Intelligence Research Laboratory at Iowa State University. Of particular interest are incremental algorithms for data-driven theory refinement (that is, extending and refining domain-specific knowledge bases) from heterogeneous data sources. Some of this research is being conducted in collaboration with molecular biologists, biochemists, and plant biologists who are experts in various domains of interest. A product of this research is BioKADLab (Biological Knowledge Analysis and Discovery Lab), a platform-independent software toolkit of machine learning and knowledge discovery algorithms for a variety of applications including: This work builds on ongoing research in Artificial Intelligence Research Laboratory in the Department of Computer Science on machine learning and knowledge discovery from heterogeneous, distributed knowledge and data sources. Researchers:


Design of Biological Data Storage, Retrieval, and Visualization Systems

Biological data is being gathered and stored in a digital form at astronomical rates. Conventional relational databases have not been designed to support efficient storage and retrieval of many types of biological data. For example, use of gene sequence data requires efficient algorithms for organizing and storing sequences as well as retrieving items of interest based on specific types of queries (e.g., return all sequences that contain a particular subsequence). Similarly, databases of molecular structures call for efficient algorithms for storing and retrieving descriptions of molecular structures. Precision farming applications need tools for storage and manipulation of 2 and 3 dimensional maps that represent spatial and spatio-temporal variations of various quantities of interest (e.g., moisture level of soil). Our research in this area explores the principles, design, implementation, and applications of information systems for data storage, retrieval, and visualization. Some of this research is being conducted in collaboration with experts in the relevant application domains. Researchers: This work builds on Dr. Miller's research on database and information systems and Dr. Honavar's research in information retrieval, pattern matching, and pattern recognition.


Intelligent Mobile Agents for Biological Information Retrieval from Heterogeneous, Distributed, Data and Knowledge Sources.

Bioinformatics research is critically dependent on the the ability to retrieve and analyze large amounts of relevant data in particular application domains (e.g., precision farming, or construction of evolutionary trees from DNA sequences). This presents several challenges in information retrieval: The relevant data may be of different forms (e.g., image data, sequence data, etc.), may be stored in different types of databases, on heterogeneous hardware and software platforms. The data is constantly being updated as new experimental results become available. Thus, there is a need for customizable information retrieval agents that autonomously gather data of interest to particular scientists, or bioinformatics projects. Our research in this area is focused on the design of a system of trainable intelligent mobile software agents for information retrieval from data sources of interest in specific bioinformatics, biotechnology, and agriculture related applications. We are also developing knowledge discovery algorithms for incremental learning from multiple distributed data sources. This research builds on and extends our current design of a platform-independent multi-agent system for customized information retrieval and knowledge discovery from heterogeneous data sources as well as an object-oriented data warehouse for organizing the retrieved data for further analysis using machine learning. Researchers: This work builds on ongoing research in Artificial Intelligence Research Laboratory in the Department of Computer Science on intelligent mobile agents and multi-agent systems for information retrieval and knowledge discovery from heterogeneous, distributed knowledge and data sources.

Return to ICBL homepage