Skip to content

Latest commit

 

History

History
63 lines (52 loc) · 3.34 KB

pal_inference.md

File metadata and controls

63 lines (52 loc) · 3.34 KB

Inference on GSM8K with PAL in InternLM-Chat

English | 简体中文

Utilize PAL paradigm inference on the GSM8K dataset, enabling the model to write code and execute it through the Python interpreter to solve mathematical problems. The usage is as follows:

python pal_inference.py \
    <model> \
    <out_dir> \
    [--dataset <dataset>] \
    [--max_length <length>] \
    [--top_p <threshold>] \
    [--eoh <end token>] \
    [--eoa <end token>] \
    [--eos <end token>] \
    [--temperature <temp>] \
    [--time_out <time>] \
    [--verbose, -v] \
    [--append, -a]

Parameter explanation:

Parameter Description
<model> Path to the model used for inference
<out_dir> Generated code will be saved in the specified output folder
--dataset Name of the dataset used for code generation (defaults to gsm8k)
--max_length Maximum input token length for the model (defaults to 2048)
--top_p Probability threshold for the sum of candidate tokens (defaults to 0.8)
--eoh User input end identifier (defaults to "")
--eoa Model input end identifier (defaults to "")
--eos System input end identifier (defaults to "")
--temperature, -t Sampling temperature during generation (defaults to 1.0)
--time_out Maximum time (in seconds) for executing generated code (defaults to 100)
--verbose, -v Print code error messages (optional)
--append, -a Append output to historical results (optional)

A simple usage example is as follows:

python tools/pal_inference.py internlm/internlm-chat-7b ./output -v

Each line in the output file includes the input question, correct answer, executed answer, score, and the Python code block generated by the model:

{
    "question": "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
    "target": 18.0,
    "answer": 18.0,
    "score": 1,
    "generation": ["```python\ndef solution():\n    eggs_per_day = 16\n    eggs_per_breakfast = 3\n    eggs_per_muffin = 4\n    eggs_used = eggs_per_day - eggs_per_breakfast - eggs_per_muffin\n    eggs_sold = eggs_used\n    price_per_egg = 2\n    eggs_made = eggs_sold * price_per_egg\n    result = eggs_made\n    return result\n```"]
}

Performance of InternLM on GSM8K dataset with and without tools is shown in the table below.

Method InternLM-Chat-7B
w/o tool 34.5
w tool 39.2