CoLeG-Math

Official code repository for paper Can LLMs Solve Longer Math Word Problems Better?
Extended Grade-School Math (E-GSM) benchmark is an arithmetic reasoning dataset built upon GSM8K, by extending problem descriptions into longer ones.
E-GSM is constructed to evaluate Context Length Generalizability (CoLeG) of LLMs, the ability of LLMs to solve long math word problems.
For proprietary LLMs, we introduce Condition-Retrieving Instruction (CoRe), an instructional prompt.
For opensource LLMs, we suggest incorporating extension as an auxiliary task of finetuning and release our SFT data.

Quick Start

Clone CoLeG-Math and install the required packages:

git clone https://github.com/XinXU-USTC/CoLeG-Math.git
cd CoLeG-Math
pip install -r requirements.txt

For vLLM installation problems, please refer to vLLM.

Dataset Usage

For now, E-GSM and our SFT data are under ./data forder. Huggingface Link: coming soon...

Evaluation on E-GSM

For proprietary LLMs, you need to put your key in proprietary-llms/api_keys.py

cd proprietary-llms
python3 main.py config.yaml

or

cd proprietary-llms
bash ../scripts/eval_proprietary.sh

or

python3 main.py \
    --llm gpt-3.5-turbo-0125 \
    --n 1 \
    --top_p 0.7 \
    --temperature 0.0 \
    --max_tokens 1024 \
    --prompt_name zero-shot-cot \
    --generate_log_file \
    --use_core_instruction \
    --dataset_filepath /path/to/datafile \
    --output_filepath /path/to/save

For opensource LLMs:

cd opensource-llms
bash ../scripts/eval_opensource.sh

or

python opensource-llms/eval_gsm8k.py --model "path/to/save" --dataset_filepath data/E-GSM/Q1.jsonl --output_filepath Q1_results.jsonl

Training

You need to prepare the LLM to be fine-tuned

bash scripts/train.sh

Thanks for the open source code of MetaMath, WizardMath and RFT. Some of our codes are based on them.

Citation

Please cite our paper if you use our dataset or extend our work:

@article{xu2024coleg-math,
  title={Can LLMs Solve longer Math Word Problems Better?},
  author={Xu, Xin and Xiao, Tong and Chao, Zitong and Huang, Zhenya and Yang, Can and Wang, Yang},
  journal={arXiv preprint arXiv:2405.14804},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoLeG-Math

Quick Start

Dataset Usage

Evaluation on E-GSM

Training

Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
opensource-llms		opensource-llms
proprietary-llms		proprietary-llms
scripts		scripts
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

XinXU-USTC/CoLeG-Math

Folders and files

Latest commit

History

Repository files navigation

CoLeG-Math

Quick Start

Dataset Usage

Evaluation on E-GSM

Training

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages