Lin Yan: Advancing Tropical Cyclone Tracking and Creating Award-Winning Research

photo of women with glasses and red shirt
Dr. Lin Yan

Dr. Lin Yan, an Assistant Professor in the Department of Computer Science, works in the field of data visualization – creating tools that help with the graphical or visual representation of information and data. Specifically, her research focuses on topological data analysis, visualization, and computational topology.

 “Data visualization provides intuitive and practical tools for scientific discovery with the clarity and aesthetic appeal of the information displayed. It allows a person to understand a large amount of data and interact with it,” Yan explained. “Such research is really interesting and exciting.”

Evolution of Research

Dr. Yan’s research began with a focus on the more theoretical side of data analysis. She defined a structural average of trees and designed an interactive visualization system that resembled a numerical calculator. The calculator took a set of trees and produced a 1-center tree as their structural average.

However, in recent years, her research has focused on solving domain-specific driven problems with data visualization techniques. Her research combines topological, geometric, statistical, data mining, and machine learning techniques with visualization to study large and complex data for information exploration and scientific discovery. Through this research, she has developed several topology-based visualization frameworks for feature understanding, data reduction techniques for scientific simulation and data analysis, and methodologies for statistical analysis of features that support uncertainty visualization. Below are two examples of topological-based visualization from Dr. Yan's research.

Example of topological-based visualization. Topology-based feature tracking
Example of topological-based visualization. Topology-based feature tracking
The left image shows wind vector fields identified as a cyclone; the right image is not a cyclone.
The left image shows wind vector fields identified as a cyclone; the right image is not a cyclone.

When asked how she decided to focus on her current research area, she explained that her past internships and post-doc experiences played a significant role. During her work at Argonne National Laboratory, her advisor, a climate scientist, had access to data of unimaginable size – such as the E3SM2 data set. 

The E3SM2 dataset is a 20-member ensemble simulating global climate hourly data from 1850-2100. That’s a lot of data. Given the size of the data, Yan saw how challenging it is for researchers to directly gain scientific insights or understand the sensitivities and uncertainties of such scientific simulations. To this end, Yan worked on effective visualization techniques for large and complex scientific data that can provide fundamental tools for visual data analysis in terms of abstraction and summarization.

Research Challenges

However, no research is without its challenges.

“With the increased availability of computing resources and sensing devices, data’s ever-increasing size and complexity pose fundamental challenges to existing visualization techniques, especially in topology-based ones,” Yan explained. 

Going into further detail, she listed three fundamental challenges: data understanding, the development of data transmission and storage systems being outpaced by data growth, and the lack of visualization tools and methodologies for understanding the uncertainties of scientific simulations.

Yan’s vision for her research to handle these challenges is to advance feature understanding by combining topological data analysis with data visualization. This will support key elements of a scientific workflow – including feature extraction, feature tracking, transitions, clusters, and periodicity detection. She will also work on developing advanced data reduction techniques and software, focusing on preserving topological features in data for in situ and post hoc analysis and visualization at extreme scales. Furthermore, she envisions new methodologies that will quantify, visualize, and mitigate data uncertainties by combining topological, geometric, statistical, and visualization techniques.

Impact of Research

For her work, Dr. Yan recently won the 2024 VGTC Visualization Dissertation Award, a prestigious recognition that demonstrates the impact of her work in the field, for her dissertation addressing challenges in big data analysis through enriching methodologies and tools of topology-based visualization for scientific data exploration. These methodologies and tools have applications in structural biology, climate science, combustion study, and neuroscience.

Through her research, Yan wanted to identify and track features that can represent real-world phenomena and find some interesting properties of features such as symmetry, transitions, clusters, and periodicity. As a result, she has designed a variety of tools.

She designed an uncertainty visualization tool that biologists can use in neuron morphology analysis to help study differences and/or variations among different reconstructions of the same neuron cell. Another uncertainty visualization framework that she has developed captures boundary and interior uncertainties of the appearance of atmospheric rivers. 

Annually detected tropical cyclones by observations (left), a well-validated tracking algorithm called TempestExtremes (middle), and TROPHY (right) for the year 2004, as an example. Tracks are colored by maximum wind speed.

Additionally, Dr. Yan led the design and development of TROPHY, which is a new tropical cyclone tracking framework that can be used for understanding and improving the future projections of tropical cyclones. TROPHY is a framework that can improve the visual interpretability of vector fields in terms of feature tracking, selection, and comparison for large-scale simulations.  As a result, she and her team created a framework that produces results as good as or better than the existing tropical tracking algorithm while requiring far fewer input data.