Ph.D. Final Oral Exam: Ge Luo
Speaker:Ge Luo
Enhancing Automatic Summarization Evaluation: Metric Design, Consistency, and Explanation
Automatic summarization plays a pivotal role in condensing vast amounts of information into concise representations, thereby facilitating efficient information retrieval and comprehension. Nonetheless, evaluating the effectiveness of automatic summarization systems remains a significant challenge. This work presents advancements in automatic summarization evaluation across three main areas.
In the first part, we introduce a comprehensive framework for summarization metric design, taking into account various linguistic and structural aspects of summaries. We propose evaluating the overall quality of a summary by training a metric from synthesized summaries. By learning a preference rank of gold summaries and inferior summaries synthesized by corrupting the gold summary, our method eliminates the need for a reference summary during inference time.
In the second part, we delve into a specific aspect of summarization evaluation: the assessment of factual consistency in machine-generated summaries. Initially, we systematically analyze the shortcomings of current methods in synthesizing inconsistent summaries. Subsequently, employing the parameter-efficient finetuning (PEFT) technique, we discover that a competitive factual consistency detector can be achieved using thousands of real model-generated summaries with human annotations.
In the third part, we extend our investigation to incorporate explanations into the consistency metric, aiming to provide insights into the underlying processes driving the evaluation outcomes. We collect human-written natural language explanations on the inconsistent summary and article pairs. Leveraging the collected data, we train a text-generation model to output the consistency judgment and explanation together. We demonstrate the efficacy of the trained metric in enhancing the interpretability of the consistency assessment, empowering users to understand and interpret the evaluation results effectively.