Benjamin Steenhoek on Using Deep Learning to Improve Software Development

Graduate Student Spotlight: Benjamin Steenhoek

With a background in software development, it makes sense why Benjamin Steenhoek would be interested in helping software developers write better code. Through his professional experience before pursuing his graduate studies here at Iowa State, he saw software not as a set of instructions for computers to follow but rather a form of human-to-human communication. With software becoming more integral to the job market and economy, businesses are embracing technology in new and exciting ways, such as automation, data analysis, and artificial intelligence. But are the tools used to write the code the best they can be? 

Originally, Steenhoek focused his research on traditional program analysis. There, he encountered static and dynamic analysis limitations and soon became interested in applying deep learning methods to source code. While reproducing prior works in the field, Steenhoek asked more and more questions about model evaluations. Specifically, he often encountered new models that showed improved performance compared to baselines, but he found that the reasons for the improvement and the specific scenarios in which it occurred needed to be better understood. This lack of understanding, he found, was a serious limitation in the advancement of model robustness, debugging, and deployment for vulnerability detection. 

As a first step in understanding this field, Steenhoek and his fellow Iowa State students, Md Mahbubur Rahman, and Richard Jiles, conducted an empirical study under the guidance of his advisor Dr. Wei Le, an Associate Professor of Computer Science at Iowa State University. In their research, published at the International Conference on Software Engineering 2023, the team studied recently published deep learning models to understand the relations of training data sizes and training data composition with model performance, which features the models used to make predictions and which types of programs were easy or difficult for the models. The team has already used the results of their study to motivate new model design decisions and hopes that the study’s findings will be useful in understanding model results, providing guidance on preparing training data, and improving the robustness of the models. 

Steenhoek hopes that his research can improve quality assurance, developer experience, and software security. Better tools mean better code. In the end, users can be less vulnerable to security threats, and we can all be better protected as we continue to use and innovate technology.