English | 简体中文
Utilize PAL paradigm inference on the GSM8K dataset, enabling the model to write code and execute it through the Python interpreter to solve mathematical problems. The usage is as follows:
python pal_inference.py \
<model> \
<out_dir> \
[--dataset <dataset>] \
[--max_length <length>] \
[--top_p <threshold>] \
[--eoh <end token>] \
[--eoa <end token>] \
[--eos <end token>] \
[--temperature <temp>] \
[--time_out <time>] \
[--verbose, -v] \
[--append, -a]
Parameter explanation:
Parameter | Description |
---|---|
<model> | Path to the model used for inference |
<out_dir> | Generated code will be saved in the specified output folder |
--dataset | Name of the dataset used for code generation (defaults to gsm8k) |
--max_length | Maximum input token length for the model (defaults to 2048) |
--top_p | Probability threshold for the sum of candidate tokens (defaults to 0.8) |
--eoh | User input end identifier (defaults to "") |
--eoa | Model input end identifier (defaults to "") |
--eos | System input end identifier (defaults to "") |
--temperature, -t | Sampling temperature during generation (defaults to 1.0) |
--time_out | Maximum time (in seconds) for executing generated code (defaults to 100) |
--verbose, -v | Print code error messages (optional) |
--append, -a | Append output to historical results (optional) |
A simple usage example is as follows:
python tools/pal_inference.py internlm/internlm-chat-7b ./output -v
Each line in the output file includes the input question, correct answer, executed answer, score, and the Python code block generated by the model:
{
"question": "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
"target": 18.0,
"answer": 18.0,
"score": 1,
"generation": ["```python\ndef solution():\n eggs_per_day = 16\n eggs_per_breakfast = 3\n eggs_per_muffin = 4\n eggs_used = eggs_per_day - eggs_per_breakfast - eggs_per_muffin\n eggs_sold = eggs_used\n price_per_egg = 2\n eggs_made = eggs_sold * price_per_egg\n result = eggs_made\n return result\n```"]
}
Performance of InternLM on GSM8K dataset with and without tools is shown in the table below.
Method | InternLM-Chat-7B |
---|---|
w/o tool | 34.5 |
w tool | 39.2 |