India AI CyberGuard Hackathon Code Submission

Team Members

Name	Program & Batch	Role
Shashwati	B.Tech ECE, Batch-26, IIIT NR	Team Head
Darshan Kochar	B.Tech CSE, Batch-27, IIIT NR	Developer & Researcher
Tejas Keshwani	B.Tech CSE, Batch-27, IIIT NR	Developer & Analyst

Cybercrime Multi-Class Classification

This project presents a deep learning-based approach to classify cybercrime descriptions into multiple categories and subcategories, providing an efficient tool for law enforcement and cybersecurity analysts.

Highlights

Multi-Class Classification: Efficient categorization of cybercrime reports into primary categories and subcategories.
Streamlit App Integration: A user-friendly interface for easy interaction and prediction.
BERT-Based Fine-Tuning: Employs bert-base-uncased for classification tasks.
Addressing Imbalanced Data: Implements upsampling to improve performance on minority classes.

Project Overview

Cybercrime datasets often exhibit class imbalances, challenging language use, and unique categorizations. This project leverages BERT (Bidirectional Encoder Representations from Transformers) to build robust models for multi-class and multi-label classification.

Hosted Models

The fine-tuned models are hosted on Hugging Face for public access:

Darshan Kochar's Hugging Face Models

Model List

Model Name	Task	Hugging Face Link
Category Classifier	Predict primary cybercrime category	Category Classifier)
Financial Fraud Classifier	Specialized in financial fraud classification	Financial Fraud Classifier)
Women and Child Classifier	Crimes affecting women and children	Women and Child Classifier)
Other Cyber Crimes Classifier	Handles all other crime categories	Other Cyber Crime Classifier)

Video

Here attached the link of the demonstration of our project
Watch here

Getting Started

Prerequisites

Python 3.8+ (Anaconda recommended)
Libraries: transformers, torch, pandas, scikit-learn, numpy

Setup Instructions

Step 1: Virtual Environment Setup

conda create -n cyberguard python=3.8
conda activate cyberguard

Step 2: Install Dependencies

After activating the environment, install the required packages:

pip install -r requirements.txt

Dataset

The dataset includes categories and subcategories of cybercrimes (e.g., Phishing, Identity Theft, Malware Attack). Place your dataset in the official website and those csvs made for trainig are made after preprocessing but due to larger size can't be uploaded:

train.csv: Training dataset
'financial.csv' : subset of original
'women_child.csv':subset of original
'other.csv':subset of original
test.csv: Testing dataset

Note: Due to confidentiality, the actual dataset is not provided here. Ensure your dataset follows the necessary format before training.

Running the Project

To Train the Model

Run the following command to start model training:

EDA.ipynb

Category.ipynb
ffc.ipynb
wcc.ipynb
occ.ipynb

streamlit run App.py

Arguments:
- --epochs: Number of training epochs.
- --batch_size: Batch size for training.
- --lr: Learning rate.

For Evaluators

Running the Streamlit Interface

To execute the code using a Streamlit interface,clone the repo and run:

pip install-r requirements.txt
streamlit run x.py

Handling Data Imbalance

This project includes techniques to handle data imbalance, particularly in the sub_category labels. We implement upsampling to create a balanced dataset, improving model performance on minority classes.

Model Performance

The table below summarizes the performance of different models:

Model	Precision	Recall	F1 Score	Accuracy
Category Classifier	0.9342	0.9337	0.9342	0.9337
Financial Fraud Classifier	0.9296	0.9283	0.9296	0.9280
Other Cyber Crime Classifier + ('all-mp-net' from sentence_transformers)	0.8880	0.8850	0.8880	0.8851
Women/ Child Classifier	0.9704	0.9806	0.9804	0.9892

Although the model other cyber crime is not so efficient bt its effficiency has increased simultaneously after trying to run it with similarity search using chroma_db, we highly encourage everyone to try the user interface and test us

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Hugging Face for their open-source transformer models.
PyTorch for the deep learning framework.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.gitattributes		.gitattributes
App.py		App.py
Category.ipynb		Category.ipynb
Data_Preprocessing.ipynb		Data_Preprocessing.ipynb
EDA.ipynb		EDA.ipynb
README.md		README.md
Requirements.txt		Requirements.txt
classes.py		classes.py
ffc.ipynb		ffc.ipynb
functions.py		functions.py
methods.py		methods.py
occ.ipynb		occ.ipynb
wcc.ipynb		wcc.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

India AI CyberGuard Hackathon Code Submission

Team Members

Cybercrime Multi-Class Classification

Highlights

Project Overview

Hosted Models

Model List

Video

Getting Started

Prerequisites

Setup Instructions

Step 1: Virtual Environment Setup

Step 2: Install Dependencies

Dataset

Running the Project

To Train the Model

For Evaluators

Running the Streamlit Interface

Handling Data Imbalance

Model Performance

License

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

shashbha14/X

Folders and files

Latest commit

History

Repository files navigation

India AI CyberGuard Hackathon Code Submission

Team Members

Cybercrime Multi-Class Classification

Highlights

Project Overview

Hosted Models

Model List

Video

Getting Started

Prerequisites

Setup Instructions

Step 1: Virtual Environment Setup

Step 2: Install Dependencies

Dataset

Running the Project

To Train the Model

For Evaluators

Running the Streamlit Interface

Handling Data Imbalance

Model Performance

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages