Shenyang Liu: MS Final Oral

Shenyang Liu
Tuesday, November 7, 2017 - 10:00am
216 Atanasoff Hall
Event Type: 

Major Professor:  Tian, Jin
Committee Member 1:  Tian, Jin
Committee Member 2:  Jia, Yan - Bin
Committee Member 3:  Tavanapong, Wallapak

Status:  MS Final Oral
Date: Tue, 2017-11-07
Time: 10:00 am
Location: 216 Atanasoff

Title: Clickbait Detection using Text Summarization Techniques
Abstract: The Information era brings more information in our daily life, but
among all the information, some have very low value. In this context, low
value means people do not get what they need after reading an article. Now,
most editors online try to attract people to click their links to make more
money because more clicks mean more advertising revenues. They use flashy,
horrible or preposterous titles, which is known as clickbait to catch your
eyes. Clickbaits destroy our experience online because you may find you waste
your time reading something you do not want to read. Our research aims to
detect clickbait so that time can be saved on something more useful. At the
same time, some title may be “clickbait”, but as long as the content of
an article is highly relevant to its title, then you cannot blame the editor
because it is you who decide how to use your time. In our research, only a
title that shows secondary contents of an article or irrelevant to an article
is considered as a clickbait. Processing the whole article to check clickbait
may take a long time, so a summary of an article is needed first. TextRank
algorithm is used to summarize an article because it is an unsupervised
learning method that does not need to learn from a corpus. TextRank algorithm
is a variant of Google’s PageRank algorithm that uses sentences or words in
an article as nodes, and the similarity between sentences or words as edges
to build a graph model. After initializing the weights of all nodes, PageRank
is used for this graph until all the weights converge, then all the nodes are
sorted by its weight and only the nodes with highest weights are saved. Our
summary is composed of these nodes. Then we use sentence similarity based on
semantic net and corpus statistics to compare the similarity between each
sentence in the summary we get and the title sentence. If the number of
sentences with similarity greater than a threshold s is greater than a
percentage p of the total number of the sentences in summary, then this
article is an appropriate article, otherwise, this article is a clickbait
article.  We empirically evaluate the method described above and show that
our method can detect clickbait to some extent. We created our own corpus for
testing our method because no such corpus already exists.