M.S. Final Oral Exam: Benjamin Yen Kit Lee

Tuesday, November 21, 2023 - 2:00pm
ISU Library 0101g
Event Type: 

Automated Neuron Explanation for Code-trained Language Models

In the rapidly advancing field of language models, understanding their internal workings is limited, making it challenging to detect biases or deception in their outputs. To address this, interpretability research aims to uncover insights within these models. Traditional manual inspection of individual components in large neural networks is impractical. In this paper, an automated approach was proposed using GPT-4 to generate and assess natural language explanations for code-trained language models, enhancing our understanding of language models.