Skip to content

1155157110/RaC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rephrase and Contrast: Fine-Tuning Language Models for Enhanced Understanding of Communication and Computer Networks

Introduction

This repository includes the Llama-2-7B finetuning related files, including

For finetuning, please refer to the llama-recipes README for more details.

Environment Setup

git clone https://github.com/1155157110/RaC.git
cd RaC
pip install llama-recipes transformers peft openpyxl

Llama-2-7b model weights and converted hugging face format weights needs to be put to the 7B folder before loading models.

Dataset

The datasets are built upon 10 textbooks on computer networking, including both foundational theories and the latest technical advancements in the networking field. We first employed optical character recognition (OCR) software to convert the selected textbooks into textual material. Then We leveraged the GPT-4 API for the implementation of RaC QA pair generation. Finally, QA pairs are reviewed manually. For data augmentation, we designed ChoiceBoost to augment the dataset. Please refer to our paper for more details on the implementation.

  • The released test benchmarks (easy dataset, hard dataset, comprehensive dataset) are in the respective sheets of Data_OpenSource.xlsx

Training Scripts

The used training scripts are training script, k-fold training script and ablation training script.

Example usage:

cd llama-recipes
sh train_eval_chatbot.sh

For ablation training, an argument specifying the ablation contents is required, for example:

cd llama-recipes
sh train_eval_chatbot_ablation.sh qa

Evaluating Scripts

Evaluating scripts load the finetuned model and executes model inference to generate outputs.

The evaluating script for k-fold is get_prompts.py. Usage:

python get_prompts.py --lora_path [lora_path] --dataset [dataset]

For example, python get_prompts.py --lora_path RaC_lora --dataset EasyDataset

After answer prompts are generated, evaluate the accuracy with get_accuracy.py

python get_accuracy.py --dataset [dataset]

Reference

To cite this paper, please add the following citation to your paper:

@article{wang2024rephrase,
  title={Rephrase and Contrast: Fine-Tuning Language Models for Enhanced Understanding of Communication and Computer Networks},
  author={Wang, Liujianfu and Du, Yuyang and Lin, Jingqi and Chen, Kexin and Liew, Soung Chang},
  journal={arXiv preprint arXiv:2409.19007},
  year={2024}
}

Questions

For enquiries about the paper or the code, please feel free to open an issue.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published