Course project proposal of Computer Science 587
Principles of Network and
Distributed Programming
as a part of project INDUS
(Primary draft, subject to change)
Jie Bao
2003-10-07
Background of the problem
With the rapid development of bioinformatics projects like Human Genome Project, huge amount of data has been generated. Various bioinformatics databases has been setup and made public on the internet during the pas decade. For example, PDB database provides the protein structure information; Prosite provides protein sequence information and motifs; Swiss-Prot is a curated protein sequence database which strives to provide a high level of annotation; Gene Ontology(GO) produce a controlled vocabulary of protein that can be applied to all organisms
A bioinformatics query usually involves multiple bioinformatics databases. However, those databases are heterogeneous and distributed, which make information integration a difficult task.
Firstly, the ontologies applied by those databases are different which results in low-level interoperability. User may only use their own ontology to understand and analyze the data. So the semantic integration is required for this problem.
Secondly, those databases are autonomous and heterogeneous in their structure. Some information is hiding behind the query interface, which makes a transparent combined query impossible. Network communication and iterators and needed.
This project is an attempt to provide ontology-based semantic integration from a chosen set of online bioinformatics databases.
Proposed approach
The project will be focused on the following problems which related to the content of this course
1. A set of xml based iterators to provide common interfaces to corresponding protein databases.
2. Those iterators can communicate directly with remote database
3. A combined query can be executed and decomposed to source databases
4. Results from source databases are composed at the client side
The proposed system is not aim at processing big amount data access or gathering statistics of remote database.
Implement plan
Which has been done:
Work schedule
|
Week |
Focus |
Status |
|
Week 1 Oct 5 每 11 |
XML Editing and parsing, with a swing-based interface |
Done |
Week 2 Oct 12- 18 |
PDB interface |
Carrying on |
Week 3 Oct 19- 25 |
Swiss-Prot interface |
|
Week 4 Oct 26 每 Nov 1 |
Enzyme interface |
|
Week 5 Nov 2 每 8 |
GO interface |
|
Week 6 Nov 9 每 15 |
query decomposition based on XQuery (not a complete implementation) |
|
Week 7 Nov 16 每 22 |
DAML+OIL description |
|
Week 8 Nov 23 每 29 |
Query |
|
Week9 Nov 30 每 Dec 6 |
GUI integration and testing |
|
Week10 Dec 7 每 Dec10 |
Documentation |
|
Reference : see Developing resources to INDUS project
[Return to Jie Bao's Homepage]