Skip to content

Official Repo of "Can LLMs Solve Longer Math Word Problems Better?" [ICLR 2025]

Notifications You must be signed in to change notification settings

XinXU-USTC/CoLeG-Math

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CoLeG-Math

Code License E-GSM License Python 3.9+

  • Official code repository for paper Can LLMs Solve Longer Math Word Problems Better?
  • Extended Grade-School Math (E-GSM) benchmark is an arithmetic reasoning dataset built upon GSM8K, by extending problem descriptions into longer ones.
  • E-GSM is constructed to evaluate Context Length Generalizability (CoLeG) of LLMs, the ability of LLMs to solve long math word problems.
  • For proprietary LLMs, we introduce Condition-Retrieving Instruction (CoRe), an instructional prompt.
  • For opensource LLMs, we suggest incorporating extension as an auxiliary task of finetuning and release our SFT data.

Quick Start

Clone CoLeG-Math and install the required packages:

git clone https://github.com/XinXU-USTC/CoLeG-Math.git
cd CoLeG-Math
pip install -r requirements.txt

For vLLM installation problems, please refer to vLLM.

Dataset Usage

For now, E-GSM and our SFT data are under ./data forder. Huggingface Link: coming soon...

Evaluation on E-GSM

For proprietary LLMs, you need to put your key in proprietary-llms/api_keys.py

cd proprietary-llms
python3 main.py config.yaml

or

cd proprietary-llms
bash ../scripts/eval_proprietary.sh

or

python3 main.py \
    --llm gpt-3.5-turbo-0125 \
    --n 1 \
    --top_p 0.7 \
    --temperature 0.0 \
    --max_tokens 1024 \
    --prompt_name zero-shot-cot \
    --generate_log_file \
    --use_core_instruction \
    --dataset_filepath /path/to/datafile \
    --output_filepath /path/to/save

For opensource LLMs:

cd opensource-llms
bash ../scripts/eval_opensource.sh

or

python opensource-llms/eval_gsm8k.py --model "path/to/save" --dataset_filepath data/E-GSM/Q1.jsonl --output_filepath Q1_results.jsonl

Training

You need to prepare the LLM to be fine-tuned

bash scripts/train.sh

Thanks for the open source code of MetaMath, WizardMath and RFT. Some of our codes are based on them.

Citation

Please cite our paper if you use our dataset or extend our work:

@article{xu2024coleg-math,
  title={Can LLMs Solve longer Math Word Problems Better?},
  author={Xu, Xin and Xiao, Tong and Chao, Zitong and Huang, Zhenya and Yang, Can and Wang, Yang},
  journal={arXiv preprint arXiv:2405.14804},
  year={2024}
}

About

Official Repo of "Can LLMs Solve Longer Math Word Problems Better?" [ICLR 2025]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published