Towards a Computational Understanding of Protein Function
Date/Time: September 3, 3:40 pm
Location: B29 Atanasoff Hall
Proteins facilitate almost all of life's functions. A comprehensive knowledge of the function of any organism's proteins has profound implications towards our understanding of life and, especially in humans, of wellness and disease. With the introduction of fast and cheap DNA sequencing, we have amassed a large amount of data about genes and the proteins they encode. At the same time, for most of these proteins we lack the knowledge of what they do. While biological experiments are the only reliable way to reveal a protein's function, many of these experiments do not scale up to the needs we have for understanding the hundreds of thousands of proteins sequenced. The computational annotation of protein function has therefore emerged as a problem at the forefront of computational and molecular biology.
Many methods have been developed by research groups worldwide. Some are based in comparing unsolved sequences with databases of proteins whose functions are known. Other methods aim at mining the scientific literature associated with some of these proteins, yet others combine sophisticated machine-learning algorithms with an understanding of biological processes to decipher what these proteins do. However, there is a need to assess how well these methods perform. The critical Assessment of Function Annotations (CAFA) is a new community-wide experiment to assess the performance of the multitude of methodologies developed by research groups
worldwide to help channel the flood of data from genome research to deduce the function of proteins.
In my talk I will discuss the two CAFA experiments conducted in 2011 and 2014. I will show how a time-based computational challenge is helping improve protein function prediction methods. I will pay special attention the the generation of benchmark data and to the choice of assessment metrics.