Fake News Detection System is a fully functional Data Mining project, that emphasizes the basic stages and techniques that are being followed during a data mining project:
- Data preprocessing.
- Data cleaning.
- Data transformation
- Application of data mining techniques and machine learning algorithms.
- Conclusion and extraction of statistical data.
- Tuning, adjustments and improvements in order to improve project's performance.
This project was developed in order to classify data extracted by news and articles and therefore detect fake news.
The project is seperated into four (4) main parts:
- Data preprocessing/cleaning/transformation.
- Studying the data and extracting statistical data, in order to completely comprehend the provided datasets.
- Implementation and evaluation of the system. In this section, various classification methods are being tested and compared to each other (Logistic Regression, Naive Bayes, SVM and Random Forests applied to document-words and document-vectors representations), in order to find the algorithm that is most suitable for the project. Also, every method's performance is being evaluated (accuracy, F1 score).
- Hyperparameter tuning and improving the overall best performing classification method.
The project's data consist of two (2) .csv files (True.csv
, Fake.csv
) that can be found and extracted under Fake-News-Detection/data/
directory in the project's repository.