Inference on GSM8K with PAL in InternLM-Chat

English | 简体中文

Utilize PAL paradigm inference on the GSM8K dataset, enabling the model to write code and execute it through the Python interpreter to solve mathematical problems. The usage is as follows:

python pal_inference.py \
    <model> \
    <out_dir> \
    [--dataset <dataset>] \
    [--max_length <length>] \
    [--top_p <threshold>] \
    [--eoh <end token>] \
    [--eoa <end token>] \
    [--eos <end token>] \
    [--temperature <temp>] \
    [--time_out <time>] \
    [--verbose, -v] \
    [--append, -a]

Parameter explanation:

Parameter	Description
<model>	Path to the model used for inference
<out_dir>	Generated code will be saved in the specified output folder
--dataset	Name of the dataset used for code generation (defaults to gsm8k)
--max_length	Maximum input token length for the model (defaults to 2048)
--top_p	Probability threshold for the sum of candidate tokens (defaults to 0.8)
--eoh	User input end identifier (defaults to "")
--eoa	Model input end identifier (defaults to "")
--eos	System input end identifier (defaults to "")
--temperature， -t	Sampling temperature during generation (defaults to 1.0)
--time_out	Maximum time (in seconds) for executing generated code (defaults to 100)
--verbose, -v	Print code error messages (optional)
--append, -a	Append output to historical results (optional)

A simple usage example is as follows:

python tools/pal_inference.py internlm/internlm-chat-7b ./output -v

Each line in the output file includes the input question, correct answer, executed answer, score, and the Python code block generated by the model:

{
    "question": "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
    "target": 18.0,
    "answer": 18.0,
    "score": 1,
    "generation": ["```python\ndef solution():\n    eggs_per_day = 16\n    eggs_per_breakfast = 3\n    eggs_per_muffin = 4\n    eggs_used = eggs_per_day - eggs_per_breakfast - eggs_per_muffin\n    eggs_sold = eggs_used\n    price_per_egg = 2\n    eggs_made = eggs_sold * price_per_egg\n    result = eggs_made\n    return result\n```"]
}

Performance of InternLM on GSM8K dataset with and without tools is shown in the table below.

Method	InternLM-Chat-7B
w/o tool	34.5
w tool	39.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pal_inference.md

pal_inference.md

Inference on GSM8K with PAL in InternLM-Chat

Files

pal_inference.md

Latest commit

History

pal_inference.md

File metadata and controls

Inference on GSM8K with PAL in InternLM-Chat