Due to the widespread use of fake news in social and news media, it is an emerging research topic gaining attention in today‘s world. In news media and social media, information is spread at high speed but without accuracy, and therefore detection mechanisms should be able to predict news quickly enough to combat the spread of fake news. It has the potential for a negative impact on individuals and society. Therefore, detecting fake news is important and also a technically challenging problem nowadays. The challenge is to use text classification to combat fake news. This includes determining appropriate text classification methods and evaluating how good these methods are at distinguishing between fake and non- fake news. Machine learning is helpful for building Artificial intelligence systems based on tacit knowledge because it can help us solve complex problems based on real-world data. For this reason, I proposed that integrating text classification and fact checking of check-worthy statements can be helpful in detecting fake news. I used text processing and three classifiers such as Passive Aggressive, Naïve Bayes, and Support Vector Machine to classify the news data. Text classification mainly focuses on extracting various features from texts and then incorporating these features into the classification. The big challenge in this area is the lack of an efficient method to distinguish between fake news and non-fake news due to the lack of corpora. I applied three different machine learning classifiers to two publicly available datasets. Experimental analysis based on the available dataset shows very encouraging and improved performance. Simple classification is not quite accurate in detecting fake news because the classification methods are not specialized for fake news. So I added a system that checks the news in depth sentence by sentence. Fact checking is a multi-step process that begins with the extraction of check-worthy statements. Identification of check-worthy statements is a subtask in the fact checking process, the automation of which would reduce the time and effort required to fact check a statement. In this thesis I have proposed an approach that focuses on classifying statements into check-worthy and not check-worthy, while also taking into account the context around a statement. This work shows that inclusion of context in the approach makes a significant contribution to classification, while at the same time using more general features to capture information from sentences. The aim of thischallenge is to propose an approach that automatically identifies check-worthy statements for fact checking, including the context around a statement. The results are analyzed by examining which features contributes more to classification, but also how well the approach performs. For this work, a dataset is created by consulting different fact checking organizations. It contains debates and speeches in the domain of politics. The capability of the approach is evaluated in this domain. The approach starts with extracting sentence and context features from the sentences, and then classifying the sentences based on these features. The feature set and context features are selected after several experiments, based on how well they differentiate check-worthy statements. Fact checking has received increasing attention after the 2016 United States Presidential election; so far that many efforts have been made to develop a viable automated fact checking system. I introduced a web based approach for fact checking that compares the full news text and headline with known facts such as name, location, and place. The challenge is to develop an automated application that takes claims directly from mainstream news media websites and fact checks the news after applying classification and fact checking components. For fact checking a dataset is constructed that contains 2146 news articles labelled fake, non-fake and unverified. I include forty mainstream news media sources to compare the results and also Wikipedia for double verification. This work shows that a combination of text classification and fact checking gives considerable contribution to the detection of fake news, while also using more general features to capture information from sentences.

Combining Text Classification and Fact Checking to Detect Fake News

AHMED, SAJJAD
2022-08-26

Abstract

Due to the widespread use of fake news in social and news media, it is an emerging research topic gaining attention in today‘s world. In news media and social media, information is spread at high speed but without accuracy, and therefore detection mechanisms should be able to predict news quickly enough to combat the spread of fake news. It has the potential for a negative impact on individuals and society. Therefore, detecting fake news is important and also a technically challenging problem nowadays. The challenge is to use text classification to combat fake news. This includes determining appropriate text classification methods and evaluating how good these methods are at distinguishing between fake and non- fake news. Machine learning is helpful for building Artificial intelligence systems based on tacit knowledge because it can help us solve complex problems based on real-world data. For this reason, I proposed that integrating text classification and fact checking of check-worthy statements can be helpful in detecting fake news. I used text processing and three classifiers such as Passive Aggressive, Naïve Bayes, and Support Vector Machine to classify the news data. Text classification mainly focuses on extracting various features from texts and then incorporating these features into the classification. The big challenge in this area is the lack of an efficient method to distinguish between fake news and non-fake news due to the lack of corpora. I applied three different machine learning classifiers to two publicly available datasets. Experimental analysis based on the available dataset shows very encouraging and improved performance. Simple classification is not quite accurate in detecting fake news because the classification methods are not specialized for fake news. So I added a system that checks the news in depth sentence by sentence. Fact checking is a multi-step process that begins with the extraction of check-worthy statements. Identification of check-worthy statements is a subtask in the fact checking process, the automation of which would reduce the time and effort required to fact check a statement. In this thesis I have proposed an approach that focuses on classifying statements into check-worthy and not check-worthy, while also taking into account the context around a statement. This work shows that inclusion of context in the approach makes a significant contribution to classification, while at the same time using more general features to capture information from sentences. The aim of thischallenge is to propose an approach that automatically identifies check-worthy statements for fact checking, including the context around a statement. The results are analyzed by examining which features contributes more to classification, but also how well the approach performs. For this work, a dataset is created by consulting different fact checking organizations. It contains debates and speeches in the domain of politics. The capability of the approach is evaluated in this domain. The approach starts with extracting sentence and context features from the sentences, and then classifying the sentences based on these features. The feature set and context features are selected after several experiments, based on how well they differentiate check-worthy statements. Fact checking has received increasing attention after the 2016 United States Presidential election; so far that many efforts have been made to develop a viable automated fact checking system. I introduced a web based approach for fact checking that compares the full news text and headline with known facts such as name, location, and place. The challenge is to develop an automated application that takes claims directly from mainstream news media websites and fact checks the news after applying classification and fact checking components. For fact checking a dataset is constructed that contains 2146 news articles labelled fake, non-fake and unverified. I include forty mainstream news media sources to compare the results and also Wikipedia for double verification. This work shows that a combination of text classification and fact checking gives considerable contribution to the detection of fake news, while also using more general features to capture information from sentences.
26-ago-2022
Science and Technology
Settore INF/01 - Informatica
Settore INFO-01/A - Informatica
URN:NBN:IT:UNICAM-157229
HINKELMANN, KARL KNUT
File in questo prodotto:
File Dimensione Formato  
26_08_22 Ahmed Sajjad-Thesis-Final_revised (1).pdf

accesso aperto

Descrizione: Tesi di dottorato SAJJAD AHMED
Tipologia: Altro materiale allegato
Licenza: DRM non definito
Dimensione 5.31 MB
Formato Adobe PDF
5.31 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11581/482819
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact