Skip to content

This repository is the coding part of Master Thesis "Financial Sentiment Analysis"

License

Notifications You must be signed in to change notification settings

JINGEWU/Financial-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Financial Sentiment Analysis

Introduction

This work investigates a textual-based sentiment analysis model by adopting the newly-devised Natural Language Processing (NLP) tools and performs the stock return prediction by integrating sentiment indices from different information sources, through applying the powerful sequential neural networks.

Experiment with 12 different models:

  • Keras CNN

  • Keras GRU

  • Keras LSTM

  • Keras BiGRU

  • Keras BiLSTM

  • Word2Vec CNN

  • Word2Vec GRU

  • Word2Vec LSTM

  • Word2Vec BiGRU

  • Word2Vec BiLSTM

  • BERT

  • CH_BERT

Explanation of Files

There are three folders in this repository

  1. Code

    This repository contains codes for data preprocessing and model training.

    The data used in model training is already preprocessed and no need to run those codes in preprocess_data. These files contain model training and evaluation:

    • CH_BERT.ipynb
    • CH_SVC_NB.ipynb
    • EN_sentiment_analysis.ipynb

    To run these files, you need to change the filepath before running. All these codes were run on Google Colab.

  2. Data

    There are different version of dataset in this repository. The main two dataset used in model training are:

    • Chinese dataset: data/CH.csv

    • English dataset: data/EN.csv

    We also provide the raw data and a full data before cleaning and filtering for future research, which are:

    • Chinese dataset:

      • data/raw/CH/choice_news_ustime.csv

        This is the Chinese data using US time zone.

      • data/raw/CH/choice_news.csv

        This is the Chinese data using China time zone.

    • English dataset:

      There are many files:

      • data/raw/EN/nytimes_title.rar

        Headlines from the New York Times

      • Dataset from Kaggle

Set up

This project uses Google Colab Pro for training and testing, we highly recommend you run this on Google Colab Pro.

Python Version

  • 3.7

Modules

  • numpy

  • pandas

  • json

  • os

  • tensorflow (2.x)

  • sklearn

  • seborn

  • matplotlib.pyplot

  • gensim.downloader

  • tensorflow_hub

  • tokenization

  • sentencepiece

  • torch

  • transformers

  • tqdm.notebook

About

This repository is the coding part of Master Thesis "Financial Sentiment Analysis"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published