Team Name : AI Avengers
Problem Statement 2
Video Link : https://drive.google.com/file/d/1BhweMMM39sdqVBBkoyz3xXFYUDQ-xU8f/view?usp=sharing
G2 has more than 2.5 million reviews for various products and services. These reviews help both buyers and software vendors in decision-making. One interesting aspect of the review data that we want to solve is to list the exact feature sets the customers are looking for. A few examples include application performance, the overall user experience, missing functionality, bugs, etc. As an aspiring Computer Science graduate, we would like you to develop a system that analyses the review data for a particular product from G2 using the API provided below and provides a list of feature sets that the customers are looking for. Here's the API endpoint that you can use:
https://data.g2.com/api/docs#reviews-list - You can use this batch API to fetch reviews of G2 Marketing solutions in a batch of 100 using the page[size] param. Once you have accumulated all reviews, use an algorithm to find the customer asks. The results can be printed on the console.
- Data Extraction: Data is extracted in JSON format from the API endpoint using the provided API key.
- Conversion to CSV: The JSON file is converted into a CSV file for easier processing and analysis.
- Sentiment Analysis: Sentiment analysis is applied to the data to identify top positive and negative comments. This helps in understanding customer preferences and areas for improvement.
- Machine Learning Model Training: The sentiment analysis results and original data are sent to a Language Model (LLM) for training. This step enhances the accuracy of predictions by leveraging machine learning.
- Interactive Web Application: Gradio is utilized to create an interactive web chat application interface. This interface enables users to perform analysis on the provided data swiftly and accurately.
Overall This Proposed does 2 processes :
- Sentiment Analysis (SA) using NLP (Both VADER and RoBERTa) on the data and reviews provided.
- Sending the SA data to an LLM as input.
Note : Reviews with no comments were replaced with "title" values and the final dataset after SA has the merged comments (both love and hate) - This was done to reduce bias as this is being given as the input to an LLM.
-
Clone the repo
git clone https://github.com/YashaswiniIppili/AI_Avengers-G2_PS2.git
-
Obtain API access and key, from G2 and replace the API-KEY with your api key in the fetch.py File
secret_token = "API-KEY"
After replacing run
python3 fetch.py
-
Convert the json file you got into a csv file.
-
Open two Colab Notebooks with both the notebooks G2NLP.ipynb and G2NLP.ipynb in two tabs.
-
Run all the cells in the G2NLP.ipynb file, this file will perform the sentiment analysis and the required preprocessing of the data for the LLM part of the project, DO NOT FORGET TO CHANGE THE PATH OF THE DATASET TO THE EXTRACTED DATASET
-
You will be able to see the Top 10 likes and dislikes followed by the formation of the final.csv dataset.
- Now run all the cells of G2LLM.ipynb with the dataset path as the directory's path of final.csv on your local computer
- Open the link the Gradio provides, to go the interactive AI powered chat bot.
- We are using Meta-LLAMA-2-7b Model as our LLM model with sentence transformers to compute the embeddings (dense vector representations) for sentences.
- Now let the model train on the data provided (this process speed will differ based on the computational power of the local computer)
If any doubts Arise ot persist feel free to contact us at [email protected], [email protected]