Projects in Computational Biology and Bioinformatics
Tools for Data-Driven Biological Knowledge Discovery
Biological data (e.g., genome data, protein data, ecological data, botanical
data, zoological data) is being accumulated and stored in digital form at
astronomical rates. There is a growing need for intelligent data analysis
tools for automated knowledge acquisition and discovery from such data
sources. Our current research is aimed at the design, implementation,
adaptation, and application of a broad range of machine learning tools for
data analysis and knowledge discovery in biological and agricultural sciences.
This work builds on recent work on a broad range of data mining and knowledge
discovery algorithms (including those drawn from artificial intelligence,
statistical pattern recognition, neural networks, and evolutionary
computing) by Honavar and his students in the Artificial Intelligence Research Laboratory at Iowa State University.
Of particular interest are incremental algorithms for data-driven
theory refinement (that is, extending and refining domain-specific knowledge
bases) from heterogeneous data sources. Some of this
research is being conducted in collaboration with molecular biologists,
biochemists, and plant biologists who are experts in various domains of
interest.
A product of this research is BioKADLab (Biological Knowledge Analysis and Discovery Lab),
a platform-independent software toolkit of machine learning and knowledge discovery
algorithms for a variety of applications including:
- identification of structure-function relationships in proteins;
- screening DNA nucleotide sequences to identify segments of interest
e.g., segments that contain a ribosome binding site or a promoter;
-
identification of protein coding regions of DNA;
-
diagnosis of plant diseases;
-
analysis of plant genome data (especially soybean, corn, and maize genome data);
-
multi-sequence primer design.
This work builds on ongoing research in Artificial Intelligence Research Laboratory in the Department of Computer Science on machine learning and knowledge discovery from heterogeneous, distributed knowledge and data sources.
Researchers:
Design of Biological Data Storage, Retrieval, and Visualization Systems
Biological data is being gathered and stored in a digital form at astronomical
rates. Conventional relational databases have not been designed to support
efficient storage and retrieval of many types of biological data. For example,
use of gene sequence data requires efficient algorithms for organizing and
storing sequences as well as retrieving items of interest based on specific
types of queries (e.g., return all sequences that contain a particular
subsequence). Similarly, databases of molecular structures call for efficient
algorithms for storing and retrieving descriptions of molecular structures.
Precision farming applications need tools for storage and manipulation of
2 and 3 dimensional maps that represent spatial and spatio-temporal variations
of various quantities of interest (e.g., moisture level of soil). Our research
in this area explores the principles, design, implementation, and applications
of information systems for data storage, retrieval, and visualization.
Some of this research is being conducted in collaboration with experts in the
relevant application domains.
Researchers:
This work builds on Dr. Miller's research on database and information systems and Dr. Honavar's research in information retrieval, pattern matching, and pattern recognition.
Intelligent Mobile Agents for Biological Information Retrieval
from Heterogeneous, Distributed, Data and Knowledge Sources.
Bioinformatics research is critically dependent on the the ability to retrieve
and analyze large amounts of relevant data in particular application domains
(e.g., precision farming, or construction of evolutionary trees from DNA
sequences). This presents several challenges in information retrieval: The
relevant data may be of different forms (e.g., image data, sequence data,
etc.), may be stored in different types of databases, on heterogeneous hardware
and software platforms. The data is constantly being updated as new experimental results become available. Thus, there is a need for customizable information
retrieval agents that autonomously gather data of interest to particular
scientists, or bioinformatics projects. Our research in this area is
focused on the design of a system of trainable intelligent mobile software
agents for information retrieval from data sources of interest in specific
bioinformatics, biotechnology, and agriculture related applications. We are also developing knowledge discovery algorithms for incremental learning from multiple distributed data sources.
This research builds on and extends our current design of a platform-independent multi-agent system for customized information retrieval and knowledge discovery from heterogeneous
data sources as well as an object-oriented data warehouse for organizing the
retrieved data for further analysis using machine learning.
Researchers:
This work builds on ongoing research in Artificial Intelligence Research Laboratory in the Department of Computer Science on intelligent mobile agents and multi-agent systems for information retrieval and knowledge discovery from heterogeneous, distributed knowledge and data sources.

Return to ICBL homepage