Developing Informatics Tools for Genome Analysis
Date/Time: March 31st, 3:40pm
Location: Lee Liu Hall Howe Hall
Coffee & Hors d'oeuvres at 3:10pm in the Atrium
There is always a strong demand from the genomics community to develop efficient algorithms and tools for genome data analysis. Data processing and analysis in sheer quantity pose extremely difficult challenges in many large and complex genome projects. In this seminar, sequence alignment using hash index method will be addressed first. Performance of the algorithm is discussed and compared with other methods with respect to search speed, memory usage, and sensitivity. It then follows a brief description on structural variation detection using a pattern growth method, which precisely pinpoints the breakpoints of insertions and deletions from paired-end reads. The focus of the talk will be on genome assembly by introducing a key element of the algorithm to cluster billions of NGS reads into size controllable groups based on sequence similarities and therefore permits a simple approach to parallelization of the assembly process. Finally, we discuss the application of 3rd generation data from Oxford Nanopore on genome scaffolding and detection of structural rearrangements.
Dr Zemin Ning is a Senior Scientific Manager and heads the group of "Sequence Assembly and Data Analysis" at the Wellcome Trust Sanger Institute, UK. Trained in Engineering/Physics, he has been active in genome informatics, specializing in sequence alignment and genome assembly. Over the past years, he and his colleagues in the group have developed a number of bioinformatics tools, such as SSAHA/SSAHA2, Phusion/Phusion2, Smalt and Pindel etc. The group also produced over 30 de novo assemblies from large animal and plant genomes, including Gorilla, Zebrafish, Tasmanian Devil, Bamboo and Miscanthus.
*This Seminar is made possible as an initial part of the BCB hiring initiative in which the Department of Computer Science at Iowa State University seeks outstanding applicants for a Full Professor at the intersection of big data analytics in bioinformatics and computational biology.*