forked from JiangXiaElves/ZhenHuanBot
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtest.log
770 lines (675 loc) · 46.6 KB
/
test.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /root/miniconda3/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
bin /root/miniconda3/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
bin /root/miniconda3/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('Asia/Shanghai')}
warn(msg)
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_w974vz2u/none_bdpaorzc/attempt_0/0/error.json')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 113
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /root/miniconda3/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('Asia/Shanghai')}
warn(msg)
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_w974vz2u/none_bdpaorzc/attempt_0/1/error.json')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 113
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /root/miniconda3/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('Asia/Shanghai')}
warn(msg)
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_w974vz2u/none_bdpaorzc/attempt_0/2/error.json')}
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 113
/root/miniconda3/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /root/miniconda3/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda113_nocublaslt.so...
2023-05-18 14:00:54 - finetune.py[line:66] - INFO: args.__dict__ : {'model_config_file': 'run_config/Bloom_config.json', 'deepspeed': None, 'resume_from_checkpoint': False, 'lora_hyperparams_file': 'run_config/lora_hyperparams_bloom.json', 'use_lora': True, 'local_rank': None}
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: model_type : bloom
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: model_name_or_path : /root/autodl-tmp/jiangxia/base_model/BLOOMZ_7B
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: data_path : data_dir/zh_data.json
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: output_dir : trained_models/bloomz_ckpt
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: batch_size : 32
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: per_device_train_batch_size : 4
2023-05-18 14:00:54 - finetune.py[line:66] - INFO: args.__dict__ : {'model_config_file': 'run_config/Bloom_config.json', 'deepspeed': None, 'resume_from_checkpoint': False, 'lora_hyperparams_file': 'run_config/lora_hyperparams_bloom.json', 'use_lora': True, 'local_rank': None}
2023-05-18 14:00:54 - finetune.py[line:66] - INFO: args.__dict__ : {'model_config_file': 'run_config/Bloom_config.json', 'deepspeed': None, 'resume_from_checkpoint': False, 'lora_hyperparams_file': 'run_config/lora_hyperparams_bloom.json', 'use_lora': True, 'local_rank': None}
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: num_epochs : 50
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: model_type : bloom
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: model_type : bloom
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: learning_rate : 8e-05
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: model_name_or_path : /root/autodl-tmp/jiangxia/base_model/BLOOMZ_7B
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: model_name_or_path : /root/autodl-tmp/jiangxia/base_model/BLOOMZ_7B
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: cutoff_len : 1024
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: data_path : data_dir/zh_data.json
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: data_path : data_dir/zh_data.json
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: val_set_size : 0
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: output_dir : trained_models/bloomz_ckpt
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: output_dir : trained_models/bloomz_ckpt
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: val_set_rate : 0.1
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: batch_size : 32
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: batch_size : 32
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: save_steps : 4000
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: per_device_train_batch_size : 4
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: per_device_train_batch_size : 4
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: eval_steps : 1000
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: num_epochs : 50
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: num_epochs : 50
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: warmup_steps : 10
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: learning_rate : 8e-05
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: learning_rate : 8e-05
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: logging_steps : 10
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: cutoff_len : 1024
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: cutoff_len : 1024
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: weight_decay : 0.001
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: val_set_size : 0
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: val_set_size : 0
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: warmup_rate : 0.1
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: val_set_rate : 0.1
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: val_set_rate : 0.1
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: lr_scheduler : linear
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: save_steps : 4000
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: save_steps : 4000
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: gradient_accumulation_steps : 8
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: eval_steps : 1000
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: eval_steps : 1000
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: warmup_steps : 10
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: warmup_steps : 10
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: logging_steps : 10
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: logging_steps : 10
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: weight_decay : 0.001
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: weight_decay : 0.001
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: warmup_rate : 0.1
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: warmup_rate : 0.1
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: lr_scheduler : linear
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: lr_scheduler : linear
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: gradient_accumulation_steps : 8
2023-05-18 14:00:54 - finetune.py[line:68] - INFO: gradient_accumulation_steps : 8
2023-05-18 14:01:24 - finetune.py[line:150] - INFO: lora_r : 8
2023-05-18 14:01:24 - finetune.py[line:150] - INFO: lora_alpha : 16
2023-05-18 14:01:24 - finetune.py[line:150] - INFO: lora_dropout : 0.05
2023-05-18 14:01:24 - finetune.py[line:150] - INFO: lora_target_modules : ['query_key_value']
LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, base_model_name_or_path=None, task_type='CAUSAL_LM', inference_mode=False, r=8, target_modules=['query_key_value'], lora_alpha=16, lora_dropout=0.05, merge_weights=False, fan_in_fan_out=False, enable_lora=None, bias='none', modules_to_save=None)
/root/miniconda3/lib/python3.8/site-packages/peft/tuners/lora.py:173: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False.
warnings.warn(
2023-05-18 14:01:24 - finetune.py[line:150] - INFO: lora_r : 8
2023-05-18 14:01:24 - finetune.py[line:150] - INFO: lora_alpha : 16
2023-05-18 14:01:24 - finetune.py[line:150] - INFO: lora_dropout : 0.05
2023-05-18 14:01:24 - finetune.py[line:150] - INFO: lora_target_modules : ['query_key_value']
LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, base_model_name_or_path=None, task_type='CAUSAL_LM', inference_mode=False, r=8, target_modules=['query_key_value'], lora_alpha=16, lora_dropout=0.05, merge_weights=False, fan_in_fan_out=False, enable_lora=None, bias='none', modules_to_save=None)
/root/miniconda3/lib/python3.8/site-packages/peft/tuners/lora.py:173: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False.
warnings.warn(
2023-05-18 14:01:26 - finetune.py[line:150] - INFO: lora_r : 8
2023-05-18 14:01:26 - finetune.py[line:150] - INFO: lora_alpha : 16
2023-05-18 14:01:26 - finetune.py[line:150] - INFO: lora_dropout : 0.05
2023-05-18 14:01:26 - finetune.py[line:150] - INFO: lora_target_modules : ['query_key_value']
LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, base_model_name_or_path=None, task_type='CAUSAL_LM', inference_mode=False, r=8, target_modules=['query_key_value'], lora_alpha=16, lora_dropout=0.05, merge_weights=False, fan_in_fan_out=False, enable_lora=None, bias='none', modules_to_save=None)
/root/miniconda3/lib/python3.8/site-packages/peft/tuners/lora.py:173: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False.
warnings.warn(
Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-ae86c8fbb70435df/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e...
Downloading data files: 0%| | 0/1 [00:00<?, ?it/s]Downloading data files: 100%|██████████| 1/1 [00:00<00:00, 7639.90it/s]
Extracting data files: 0%| | 0/1 [00:00<?, ?it/s]Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 1231.45it/s]
Generating train split: 0 examples [00:00, ? examples/s] Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-ae86c8fbb70435df/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e. Subsequent calls will reuse this data.
0%| | 0/1 [00:00<?, ?it/s]100%|██████████| 1/1 [00:00<00:00, 528.92it/s]
DatasetDict({
train: Dataset({
features: ['instruction', 'input', 'output'],
num_rows: 3687
})
})
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-ae86c8fbb70435df/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
0%| | 0/1 [00:00<?, ?it/s]100%|██████████| 1/1 [00:00<00:00, 297.45it/s]
DatasetDict({
train: Dataset({
features: ['instruction', 'input', 'output'],
num_rows: 3687
})
})
Map: 0%| | 0/3687 [00:00<?, ? examples/s]Map: 5%|▌ | 190/3687 [00:00<00:01, 1866.08 examples/s]Map: 0%| | 0/3687 [00:00<?, ? examples/s]Map: 11%|█ | 389/3687 [00:00<00:01, 1935.06 examples/s]Map: 5%|▍ | 180/3687 [00:00<00:01, 1769.44 examples/s]Map: 16%|█▌ | 592/3687 [00:00<00:01, 1974.90 examples/s]Map: 10%|█ | 379/3687 [00:00<00:01, 1890.27 examples/s]Map: 21%|██▏ | 791/3687 [00:00<00:01, 1978.10 examples/s]Map: 18%|█▊ | 666/3687 [00:00<00:01, 1897.15 examples/s]Map: 27%|██▋ | 990/3687 [00:00<00:01, 1977.94 examples/s]Map: 23%|██▎ | 857/3687 [00:00<00:01, 1899.16 examples/s]Map: 33%|███▎ | 1221/3687 [00:00<00:01, 1782.59 examples/s]Map: 30%|██▉ | 1090/3687 [00:00<00:01, 1697.77 examples/s]Map: 38%|███▊ | 1415/3687 [00:00<00:01, 1826.11 examples/s]Map: 35%|███▍ | 1280/3687 [00:00<00:01, 1751.01 examples/s]Map: 44%|████▎ | 1611/3687 [00:00<00:01, 1860.92 examples/s]Map: 40%|████ | 1480/3687 [00:00<00:01, 1818.81 examples/s]Map: 49%|████▉ | 1813/3687 [00:00<00:00, 1904.66 examples/s]Map: 46%|████▌ | 1680/3687 [00:00<00:01, 1868.81 examples/s]Map: 57%|█████▋ | 2102/3687 [00:01<00:00, 1815.89 examples/s]Map: 51%|█████ | 1876/3687 [00:01<00:00, 1893.59 examples/s]Map: 62%|██████▏ | 2297/3687 [00:01<00:00, 1843.48 examples/s]Map: 57%|█████▋ | 2117/3687 [00:01<00:00, 1778.55 examples/s]Map: 68%|██████▊ | 2494/3687 [00:01<00:00, 1875.53 examples/s]Map: 63%|██████▎ | 2311/3687 [00:01<00:00, 1819.66 examples/s]Map: 73%|███████▎ | 2692/3687 [00:01<00:00, 1903.04 examples/s]Map: 68%|██████▊ | 2506/3687 [00:01<00:00, 1851.16 examples/s]Map: 78%|███████▊ | 2890/3687 [00:01<00:00, 1922.00 examples/s]Map: 73%|███████▎ | 2700/3687 [00:01<00:00, 1871.67 examples/s]Found cached dataset json (/root/.cache/huggingface/datasets/json/default-ae86c8fbb70435df/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
0%| | 0/1 [00:00<?, ?it/s]100%|██████████| 1/1 [00:00<00:00, 560.51it/s]
DatasetDict({
train: Dataset({
features: ['instruction', 'input', 'output'],
num_rows: 3687
})
})
Map: 85%|████████▍ | 3131/3687 [00:01<00:00, 1797.88 examples/s]Map: 81%|████████ | 2986/3687 [00:01<00:00, 1879.69 examples/s]Map: 90%|█████████ | 3327/3687 [00:01<00:00, 1838.13 examples/s]Map: 0%| | 0/3687 [00:00<?, ? examples/s]Map: 96%|█████████▌| 3525/3687 [00:01<00:00, 1871.41 examples/s]Map: 87%|████████▋ | 3218/3687 [00:01<00:00, 1757.40 examples/s]Map: 5%|▌ | 198/3687 [00:00<00:01, 1954.84 examples/s] start train...
Map: 93%|█████████▎| 3414/3687 [00:01<00:00, 1804.02 examples/s]Map: 11%|█ | 403/3687 [00:00<00:01, 2001.72 examples/s]Map: 98%|█████████▊| 3607/3687 [00:01<00:00, 1833.96 examples/s]Map: 16%|█▋ | 604/3687 [00:00<00:01, 1999.09 examples/s] start train...
Map: 22%|██▏ | 808/3687 [00:00<00:01, 2010.85 examples/s]Map: 30%|██▉ | 1104/3687 [00:00<00:01, 1859.21 examples/s]Map: 35%|███▌ | 1308/3687 [00:00<00:01, 1909.88 examples/s]Map: 41%|████ | 1511/3687 [00:00<00:01, 1940.95 examples/s]Map: 47%|████▋ | 1723/3687 [00:00<00:00, 1990.88 examples/s]Map: 54%|█████▍ | 2000/3687 [00:01<00:00, 1860.68 examples/s]Map: 60%|█████▉ | 2201/3687 [00:01<00:00, 1896.94 examples/s]Map: 65%|██████▌ | 2398/3687 [00:01<00:00, 1915.05 examples/s]Map: 71%|███████ | 2600/3687 [00:01<00:00, 1941.16 examples/s]Map: 76%|███████▌ | 2803/3687 [00:01<00:00, 1964.27 examples/s]Map: 84%|████████▍ | 3100/3687 [00:01<00:00, 1861.57 examples/s]Map: 90%|████████▉ | 3302/3687 [00:01<00:00, 1898.05 examples/s]Map: 95%|█████████▍| 3499/3687 [00:01<00:00, 1916.35 examples/s] start train...
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
trainer.train
trainer.train
trainer.train
/root/miniconda3/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
/root/miniconda3/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
/root/miniconda3/lib/python3.8/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
0%| | 0/7700 [00:00<?, ?it/s]You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
0%| | 1/7700 [00:04<8:55:04, 4.17s/it] 0%| | 2/7700 [00:04<4:34:57, 2.14s/it] 0%| | 3/7700 [00:05<3:04:54, 1.44s/it] 0%| | 4/7700 [00:06<2:19:19, 1.09s/it] 0%| | 5/7700 [00:06<1:52:43, 1.14it/s] 0%| | 6/7700 [00:07<1:34:37, 1.36it/s] 0%| | 7/7700 [00:07<1:21:09, 1.58it/s] 0%| | 8/7700 [00:07<1:13:03, 1.75it/s] 0%| | 9/7700 [00:08<1:06:49, 1.92it/s] 0%| | 10/7700 [00:08<1:02:19, 2.06it/s] {'loss': 3.8029, 'learning_rate': 8e-05, 'epoch': 0.06}
0%| | 10/7700 [00:08<1:02:19, 2.06it/s] 0%| | 11/7700 [00:09<58:38, 2.19it/s] 0%| | 12/7700 [00:09<56:44, 2.26it/s] 0%| | 13/7700 [00:09<53:00, 2.42it/s] 0%| | 14/7700 [00:10<51:51, 2.47it/s] 0%| | 15/7700 [00:10<49:51, 2.57it/s] 0%| | 16/7700 [00:10<47:17, 2.71it/s] 0%| | 17/7700 [00:11<1:08:00, 1.88it/s] 0%| | 18/7700 [00:12<1:17:21, 1.66it/s] 0%| | 19/7700 [00:13<1:17:45, 1.65it/s] 0%| | 20/7700 [00:13<1:14:37, 1.72it/s] {'loss': 3.5481, 'learning_rate': 7.98959687906372e-05, 'epoch': 0.13}
0%| | 20/7700 [00:13<1:14:37, 1.72it/s] 0%| | 21/7700 [00:14<1:11:52, 1.78it/s] 0%| | 22/7700 [00:14<1:08:23, 1.87it/s] 0%| | 23/7700 [00:15<1:04:21, 1.99it/s] 0%| | 24/7700 [00:15<1:00:34, 2.11it/s] 0%| | 25/7700 [00:15<57:29, 2.23it/s] 0%| | 26/7700 [00:16<55:14, 2.32it/s] 0%| | 27/7700 [00:16<53:14, 2.40it/s] 0%| | 28/7700 [00:17<51:14, 2.50it/s] 0%| | 29/7700 [00:17<49:48, 2.57it/s] 0%| | 30/7700 [00:17<48:11, 2.65it/s] {'loss': 3.4477, 'learning_rate': 7.979193758127439e-05, 'epoch': 0.19}
0%| | 30/7700 [00:17<48:11, 2.65it/s] 0%| | 31/7700 [00:18<46:34, 2.74it/s] 0%| | 32/7700 [00:18<43:56, 2.91it/s] 0%| | 33/7700 [00:18<41:58, 3.04it/s] 0%| | 34/7700 [00:19<1:09:11, 1.85it/s] 0%| | 35/7700 [00:20<1:16:26, 1.67it/s] 0%| | 36/7700 [00:21<1:14:53, 1.71it/s] 0%| | 37/7700 [00:21<1:13:45, 1.73it/s] 0%| | 38/7700 [00:22<1:11:23, 1.79it/s] 1%| | 39/7700 [00:22<1:07:36, 1.89it/s] 1%| | 40/7700 [00:22<1:02:29, 2.04it/s] {'loss': 3.3254, 'learning_rate': 7.968790637191158e-05, 'epoch': 0.26}
1%| | 40/7700 [00:22<1:02:29, 2.04it/s] 1%| | 41/7700 [00:23<58:54, 2.17it/s] 1%| | 42/7700 [00:23<56:38, 2.25it/s] 1%| | 43/7700 [00:24<53:56, 2.37it/s] 1%| | 44/7700 [00:24<52:21, 2.44it/s] 1%| | 45/7700 [00:24<50:29, 2.53it/s] 1%| | 46/7700 [00:25<49:07, 2.60it/s] 1%| | 47/7700 [00:25<47:13, 2.70it/s] 1%| | 48/7700 [00:25<45:43, 2.79it/s] 1%| | 49/7700 [00:26<43:26, 2.94it/s] 1%| | 50/7700 [00:26<42:17, 3.02it/s] {'loss': 3.1114, 'learning_rate': 7.958387516254877e-05, 'epoch': 0.32}
1%| | 50/7700 [00:26<42:17, 3.02it/s] 1%| | 51/7700 [00:34<5:41:49, 2.68s/it] 1%| | 52/7700 [00:35<4:24:50, 2.08s/it] 1%| | 53/7700 [00:35<3:26:55, 1.62s/it] 1%| | 54/7700 [00:36<2:45:00, 1.29s/it] 1%| | 55/7700 [00:36<2:14:52, 1.06s/it] 1%| | 56/7700 [00:37<1:49:53, 1.16it/s] 1%| | 57/7700 [00:37<1:32:14, 1.38it/s] 1%| | 58/7700 [00:38<1:19:11, 1.61it/s] 1%| | 59/7700 [00:38<1:09:39, 1.83it/s] 1%| | 60/7700 [00:38<1:03:03, 2.02it/s] {'loss': 3.3349, 'learning_rate': 7.947984395318596e-05, 'epoch': 0.39}
1%| | 60/7700 [00:38<1:03:03, 2.02it/s] 1%| | 61/7700 [00:39<58:06, 2.19it/s] 1%| | 62/7700 [00:39<54:39, 2.33it/s] 1%| | 63/7700 [00:39<51:16, 2.48it/s] 1%| | 64/7700 [00:40<48:49, 2.61it/s] 1%| | 65/7700 [00:40<45:11, 2.82it/s] 1%| | 66/7700 [00:40<42:37, 2.98it/s] 1%| | 67/7700 [00:48<5:19:00, 2.51s/it] 1%| | 68/7700 [00:49<4:14:28, 2.00s/it] 1%| | 69/7700 [00:49<3:22:50, 1.59s/it] 1%| | 70/7700 [00:50<2:44:30, 1.29s/it] {'loss': 3.0925, 'learning_rate': 7.937581274382316e-05, 'epoch': 0.45}
1%| | 70/7700 [00:50<2:44:30, 1.29s/it] 1%| | 71/7700 [00:51<2:16:25, 1.07s/it] 1%| | 72/7700 [00:51<1:54:41, 1.11it/s] 1%| | 73/7700 [00:51<1:35:43, 1.33it/s] 1%| | 74/7700 [00:52<1:22:02, 1.55it/s] 1%| | 75/7700 [00:52<1:11:57, 1.77it/s] 1%| | 76/7700 [00:53<1:04:53, 1.96it/s] 1%| | 77/7700 [00:53<59:11, 2.15it/s] 1%| | 78/7700 [00:53<55:04, 2.31it/s] 1%| | 79/7700 [00:54<51:31, 2.47it/s] 1%| | 80/7700 [00:54<49:13, 2.58it/s] {'loss': 3.2572, 'learning_rate': 7.927178153446035e-05, 'epoch': 0.52}
1%| | 80/7700 [00:54<49:13, 2.58it/s] 1%| | 81/7700 [00:54<46:21, 2.74it/s] 1%| | 82/7700 [00:55<43:26, 2.92it/s] 1%| | 83/7700 [00:55<41:41, 3.04it/s] 1%| | 84/7700 [00:56<1:19:36, 1.59it/s] 1%| | 85/7700 [00:57<1:25:24, 1.49it/s] 1%| | 86/7700 [00:58<1:22:32, 1.54it/s] 1%| | 87/7700 [00:58<1:19:10, 1.60it/s] 1%| | 88/7700 [00:59<1:14:08, 1.71it/s] 1%| | 89/7700 [00:59<1:08:15, 1.86it/s] 1%| | 90/7700 [01:00<1:03:02, 2.01it/s] {'loss': 3.165, 'learning_rate': 7.916775032509753e-05, 'epoch': 0.58}
1%| | 90/7700 [01:00<1:03:02, 2.01it/s] 1%| | 91/7700 [01:00<59:09, 2.14it/s] 1%| | 92/7700 [01:00<55:52, 2.27it/s] 1%| | 93/7700 [01:01<53:11, 2.38it/s] 1%| | 94/7700 [01:01<50:57, 2.49it/s] 1%| | 95/7700 [01:01<49:24, 2.56it/s] 1%| | 96/7700 [01:02<47:30, 2.67it/s] 1%|▏ | 97/7700 [01:02<47:01, 2.69it/s] 1%|▏ | 98/7700 [01:02<45:25, 2.79it/s] 1%|▏ | 99/7700 [01:03<44:26, 2.85it/s] 1%|▏ | 100/7700 [01:03<44:31, 2.85it/s] {'loss': 3.0802, 'learning_rate': 7.906371911573473e-05, 'epoch': 0.65}
1%|▏ | 100/7700 [01:03<44:31, 2.85it/s] 1%|▏ | 101/7700 [01:04<1:18:33, 1.61it/s] 1%|▏ | 102/7700 [01:05<1:19:45, 1.59it/s] 1%|▏ | 103/7700 [01:06<1:17:15, 1.64it/s] 1%|▏ | 104/7700 [01:06<1:13:43, 1.72it/s] 1%|▏ | 105/7700 [01:07<1:09:24, 1.82it/s] 1%|▏ | 106/7700 [01:07<1:06:04, 1.92it/s] 1%|▏ | 107/7700 [01:07<1:02:02, 2.04it/s] 1%|▏ | 108/7700 [01:08<58:20, 2.17it/s] 1%|▏ | 109/7700 [01:08<55:20, 2.29it/s] 1%|▏ | 110/7700 [01:09<53:09, 2.38it/s] {'loss': 3.3395, 'learning_rate': 7.895968790637192e-05, 'epoch': 0.71}
1%|▏ | 110/7700 [01:09<53:09, 2.38it/s] 1%|▏ | 111/7700 [01:09<51:06, 2.47it/s] 1%|▏ | 112/7700 [01:09<49:37, 2.55it/s] 1%|▏ | 113/7700 [01:10<47:40, 2.65it/s] 1%|▏ | 114/7700 [01:10<46:06, 2.74it/s] 1%|▏ | 115/7700 [01:10<44:16, 2.86it/s] 2%|▏ | 116/7700 [01:11<43:00, 2.94it/s] 2%|▏ | 117/7700 [01:12<1:04:39, 1.95it/s] 2%|▏ | 118/7700 [01:12<1:18:04, 1.62it/s] 2%|▏ | 119/7700 [01:13<1:20:18, 1.57it/s] 2%|▏ | 120/7700 [01:14<1:16:24, 1.65it/s] {'loss': 2.9646, 'learning_rate': 7.885565669700911e-05, 'epoch': 0.78}
2%|▏ | 120/7700 [01:14<1:16:24, 1.65it/s] 2%|▏ | 121/7700 [01:14<1:11:18, 1.77it/s] 2%|▏ | 122/7700 [01:15<1:07:17, 1.88it/s] 2%|▏ | 123/7700 [01:15<1:02:23, 2.02it/s] 2%|▏ | 124/7700 [01:15<58:57, 2.14it/s] 2%|▏ | 125/7700 [01:16<55:55, 2.26it/s] 2%|▏ | 126/7700 [01:16<53:35, 2.36it/s] 2%|▏ | 127/7700 [01:17<52:06, 2.42it/s] 2%|▏ | 128/7700 [01:17<50:20, 2.51it/s] 2%|▏ | 129/7700 [01:17<48:37, 2.60it/s] 2%|▏ | 130/7700 [01:18<46:55, 2.69it/s] {'loss': 3.2323, 'learning_rate': 7.87516254876463e-05, 'epoch': 0.84}
2%|▏ | 130/7700 [01:18<46:55, 2.69it/s] 2%|▏ | 131/7700 [01:18<44:54, 2.81it/s] 2%|▏ | 132/7700 [01:18<43:56, 2.87it/s] 2%|▏ | 133/7700 [01:19<42:32, 2.97it/s] 2%|▏ | 134/7700 [01:20<1:18:25, 1.61it/s] 2%|▏ | 135/7700 [01:21<1:26:33, 1.46it/s] 2%|▏ | 136/7700 [01:21<1:22:17, 1.53it/s] 2%|▏ | 137/7700 [01:22<1:17:26, 1.63it/s] 2%|▏ | 138/7700 [01:22<1:13:07, 1.72it/s] 2%|▏ | 139/7700 [01:23<1:08:40, 1.84it/s] 2%|▏ | 140/7700 [01:23<1:03:20, 1.99it/s] {'loss': 3.1083, 'learning_rate': 7.864759427828349e-05, 'epoch': 0.91}
2%|▏ | 140/7700 [01:23<1:03:20, 1.99it/s] 2%|▏ | 141/7700 [01:24<1:00:36, 2.08it/s] 2%|▏ | 142/7700 [01:24<57:08, 2.20it/s] 2%|▏ | 143/7700 [01:24<55:08, 2.28it/s] 2%|▏ | 144/7700 [01:25<52:16, 2.41it/s] 2%|▏ | 145/7700 [01:25<50:58, 2.47it/s] 2%|▏ | 146/7700 [01:25<49:32, 2.54it/s] 2%|▏ | 147/7700 [01:26<47:34, 2.65it/s] 2%|▏ | 148/7700 [01:26<45:10, 2.79it/s] 2%|▏ | 149/7700 [01:26<43:29, 2.89it/s] 2%|▏ | 150/7700 [01:27<42:05, 2.99it/s] {'loss': 2.8917, 'learning_rate': 7.854356306892068e-05, 'epoch': 0.97}
2%|▏ | 150/7700 [01:27<42:05, 2.99it/s] 2%|▏ | 151/7700 [01:28<1:00:30, 2.08it/s] 2%|▏ | 152/7700 [01:28<59:14, 2.12it/s] 2%|▏ | 153/7700 [01:28<55:53, 2.25it/s] 2%|▏ | 154/7700 [01:29<52:07, 2.41it/s] 2%|▏ | 155/7700 [01:33<3:12:07, 1.53s/it] 2%|▏ | 156/7700 [01:34<2:41:44, 1.29s/it] 2%|▏ | 157/7700 [01:34<2:15:15, 1.08s/it] 2%|▏ | 158/7700 [01:35<1:54:15, 1.10it/s] 2%|▏ | 159/7700 [01:35<1:37:30, 1.29it/s] 2%|▏ | 160/7700 [01:36<1:23:25, 1.51it/s] {'loss': 3.1827, 'learning_rate': 7.843953185955787e-05, 'epoch': 1.04}
2%|▏ | 160/7700 [01:36<1:23:25, 1.51it/s] 2%|▏ | 161/7700 [01:36<1:13:34, 1.71it/s] 2%|▏ | 162/7700 [01:36<1:05:49, 1.91it/s] 2%|▏ | 163/7700 [01:37<1:00:23, 2.08it/s] 2%|▏ | 164/7700 [01:37<56:15, 2.23it/s] 2%|▏ | 165/7700 [01:37<52:59, 2.37it/s] 2%|▏ | 166/7700 [01:38<50:20, 2.49it/s] 2%|▏ | 167/7700 [01:38<48:09, 2.61it/s] 2%|▏ | 168/7700 [01:38<47:13, 2.66it/s] 2%|▏ | 169/7700 [01:39<44:34, 2.82it/s] 2%|▏ | 170/7700 [01:39<42:57, 2.92it/s] {'loss': 2.8682, 'learning_rate': 7.833550065019506e-05, 'epoch': 1.1}
2%|▏ | 170/7700 [01:39<42:57, 2.92it/s] 2%|▏ | 171/7700 [01:47<5:12:33, 2.49s/it] 2%|▏ | 172/7700 [01:47<4:11:42, 2.01s/it] 2%|▏ | 173/7700 [01:48<3:20:15, 1.60s/it] 2%|▏ | 174/7700 [01:49<2:39:11, 1.27s/it] 2%|▏ | 175/7700 [01:49<2:11:30, 1.05s/it] 2%|▏ | 176/7700 [01:50<1:50:47, 1.13it/s] 2%|▏ | 177/7700 [01:50<1:32:40, 1.35it/s] 2%|▏ | 178/7700 [01:50<1:19:42, 1.57it/s] 2%|▏ | 179/7700 [01:51<1:10:05, 1.79it/s] 2%|▏ | 180/7700 [01:51<1:03:13, 1.98it/s] {'loss': 3.1866, 'learning_rate': 7.823146944083225e-05, 'epoch': 1.17}
2%|▏ | 180/7700 [01:51<1:03:13, 1.98it/s] 2%|▏ | 181/7700 [01:52<58:05, 2.16it/s] 2%|▏ | 182/7700 [01:52<54:22, 2.30it/s] 2%|▏ | 183/7700 [01:52<51:10, 2.45it/s] 2%|▏ | 184/7700 [01:53<48:42, 2.57it/s] 2%|▏ | 185/7700 [01:53<46:20, 2.70it/s] 2%|▏ | 186/7700 [01:53<44:13, 2.83it/s] 2%|▏ | 187/7700 [01:54<42:38, 2.94it/s] 2%|▏ | 188/7700 [01:55<1:07:29, 1.86it/s] 2%|▏ | 189/7700 [01:55<1:17:23, 1.62it/s] 2%|▏ | 190/7700 [01:56<1:15:24, 1.66it/s] {'loss': 2.9491, 'learning_rate': 7.812743823146944e-05, 'epoch': 1.23}
2%|▏ | 190/7700 [01:56<1:15:24, 1.66it/s] 2%|▏ | 191/7700 [01:56<1:12:20, 1.73it/s] 2%|▏ | 192/7700 [01:57<1:10:46, 1.77it/s] 3%|▎ | 193/7700 [01:58<1:07:22, 1.86it/s] 3%|▎ | 194/7700 [01:58<1:02:52, 1.99it/s] 3%|▎ | 195/7700 [01:58<58:52, 2.12it/s] 3%|▎ | 196/7700 [01:59<55:55, 2.24it/s] 3%|▎ | 197/7700 [01:59<53:28, 2.34it/s] 3%|▎ | 198/7700 [01:59<51:28, 2.43it/s] 3%|▎ | 199/7700 [02:00<49:54, 2.51it/s] 3%|▎ | 200/7700 [02:00<48:09, 2.60it/s] {'loss': 3.0805, 'learning_rate': 7.802340702210663e-05, 'epoch': 1.3}
3%|▎ | 200/7700 [02:00<48:09, 2.60it/s] 3%|▎ | 201/7700 [02:01<46:30, 2.69it/s] 3%|▎ | 202/7700 [02:01<43:56, 2.84it/s] 3%|▎ | 203/7700 [02:01<41:35, 3.00it/s] 3%|▎ | 204/7700 [02:01<39:43, 3.15it/s] 3%|▎ | 205/7700 [02:02<1:05:15, 1.91it/s] 3%|▎ | 206/7700 [02:03<1:14:04, 1.69it/s] 3%|▎ | 207/7700 [02:04<1:12:52, 1.71it/s] 3%|▎ | 208/7700 [02:04<1:10:25, 1.77it/s] 3%|▎ | 209/7700 [02:05<1:07:13, 1.86it/s] 3%|▎ | 210/7700 [02:05<1:03:11, 1.98it/s] {'loss': 2.9688, 'learning_rate': 7.791937581274382e-05, 'epoch': 1.36}
3%|▎ | 210/7700 [02:05<1:03:11, 1.98it/s] 3%|▎ | 211/7700 [02:06<59:23, 2.10it/s] 3%|▎ | 212/7700 [02:06<55:51, 2.23it/s] 3%|▎ | 213/7700 [02:06<53:26, 2.33it/s] 3%|▎ | 214/7700 [02:07<51:36, 2.42it/s] 3%|▎ | 215/7700 [02:07<49:45, 2.51it/s] 3%|▎ | 216/7700 [02:07<48:22, 2.58it/s] 3%|▎ | 217/7700 [02:08<46:39, 2.67it/s] 3%|▎ | 218/7700 [02:08<45:24, 2.75it/s] 3%|▎ | 219/7700 [02:08<42:47, 2.91it/s] 3%|▎ | 220/7700 [02:09<40:39, 3.07it/s] {'loss': 2.939, 'learning_rate': 7.781534460338103e-05, 'epoch': 1.43}
3%|▎ | 220/7700 [02:09<40:39, 3.07it/s] 3%|▎ | 221/7700 [02:10<1:06:51, 1.86it/s] 3%|▎ | 222/7700 [02:11<1:21:16, 1.53it/s] 3%|▎ | 223/7700 [02:11<1:23:42, 1.49it/s] 3%|▎ | 224/7700 [02:12<1:20:00, 1.56it/s] 3%|▎ | 225/7700 [02:12<1:15:57, 1.64it/s] 3%|▎ | 226/7700 [02:13<1:11:14, 1.75it/s] 3%|▎ | 227/7700 [02:13<1:05:52, 1.89it/s] 3%|▎ | 228/7700 [02:14<1:00:53, 2.05it/s] 3%|▎ | 229/7700 [02:14<57:20, 2.17it/s] 3%|▎ | 230/7700 [02:15<54:13, 2.30it/s] {'loss': 3.1682, 'learning_rate': 7.771131339401822e-05, 'epoch': 1.49}
3%|▎ | 230/7700 [02:15<54:13, 2.30it/s] 3%|▎ | 231/7700 [02:15<52:38, 2.37it/s] 3%|▎ | 232/7700 [02:15<51:19, 2.43it/s] 3%|▎ | 233/7700 [02:16<49:23, 2.52it/s] 3%|▎ | 234/7700 [02:16<48:13, 2.58it/s] 3%|▎ | 235/7700 [02:16<46:21, 2.68it/s] 3%|▎ | 236/7700 [02:17<43:37, 2.85it/s] 3%|▎ | 237/7700 [02:17<42:02, 2.96it/s] 3%|▎ | 238/7700 [02:18<1:10:28, 1.76it/s] 3%|▎ | 239/7700 [02:19<1:22:31, 1.51it/s] 3%|▎ | 240/7700 [02:20<1:20:59, 1.54it/s] {'loss': 3.0005, 'learning_rate': 7.76072821846554e-05, 'epoch': 1.56}
3%|▎ | 240/7700 [02:20<1:20:59, 1.54it/s] 3%|▎ | 241/7700 [02:20<1:17:03, 1.61it/s] 3%|▎ | 242/7700 [02:21<1:11:22, 1.74it/s] 3%|▎ | 243/7700 [02:21<1:05:56, 1.88it/s] 3%|▎ | 244/7700 [02:21<1:01:22, 2.02it/s] 3%|▎ | 245/7700 [02:22<57:51, 2.15it/s] 3%|▎ | 246/7700 [02:22<54:46, 2.27it/s] 3%|▎ | 247/7700 [02:23<52:47, 2.35it/s] 3%|▎ | 248/7700 [02:23<50:27, 2.46it/s] 3%|▎ | 249/7700 [02:23<48:27, 2.56it/s] 3%|▎ | 250/7700 [02:24<46:45, 2.66it/s] {'loss': 3.0267, 'learning_rate': 7.75032509752926e-05, 'epoch': 1.62}
3%|▎ | 250/7700 [02:24<46:45, 2.66it/s] 3%|▎ | 251/7700 [02:24<45:32, 2.73it/s] 3%|▎ | 252/7700 [02:24<43:07, 2.88it/s] 3%|▎ | 253/7700 [02:25<41:04, 3.02it/s] 3%|▎ | 254/7700 [02:25<39:30, 3.14it/s] 3%|▎ | 255/7700 [02:26<1:04:53, 1.91it/s] 3%|▎ | 256/7700 [02:27<1:12:49, 1.70it/s] 3%|▎ | 257/7700 [02:27<1:12:41, 1.71it/s] 3%|▎ | 258/7700 [02:28<1:08:47, 1.80it/s] 3%|▎ | 259/7700 [02:28<1:07:20, 1.84it/s] 3%|▎ | 260/7700 [02:29<1:03:13, 1.96it/s] {'loss': 3.0954, 'learning_rate': 7.739921976592979e-05, 'epoch': 1.69}
3%|▎ | 260/7700 [02:29<1:03:13, 1.96it/s] 3%|▎ | 261/7700 [02:29<59:24, 2.09it/s] 3%|▎ | 262/7700 [02:29<56:08, 2.21it/s] 3%|▎ | 263/7700 [02:30<53:29, 2.32it/s] 3%|▎ | 264/7700 [02:30<51:14, 2.42it/s] 3%|▎ | 265/7700 [02:31<49:18, 2.51it/s] 3%|▎ | 266/7700 [02:31<47:49, 2.59it/s] 3%|▎ | 267/7700 [02:31<46:46, 2.65it/s] 3%|▎ | 268/7700 [02:32<45:25, 2.73it/s] 3%|▎ | 269/7700 [02:32<42:34, 2.91it/s] 4%|▎ | 270/7700 [02:32<40:34, 3.05it/s] {'loss': 2.9261, 'learning_rate': 7.729518855656698e-05, 'epoch': 1.75}
4%|▎ | 270/7700 [02:32<40:34, 3.05it/s] 4%|▎ | 271/7700 [02:33<59:20, 2.09it/s] 4%|▎ | 272/7700 [02:34<1:13:23, 1.69it/s] 4%|▎ | 273/7700 [02:35<1:17:07, 1.60it/s] 4%|▎ | 274/7700 [02:35<1:13:27, 1.68it/s] 4%|▎ | 275/7700 [02:36<1:09:21, 1.78it/s] 4%|▎ | 276/7700 [02:36<1:07:29, 1.83it/s] 4%|▎ | 277/7700 [02:37<1:03:31, 1.95it/s] 4%|▎ | 278/7700 [02:37<59:42, 2.07it/s] 4%|▎ | 279/7700 [02:37<56:13, 2.20it/s] 4%|▎ | 280/7700 [02:38<53:37, 2.31it/s] {'loss': 3.1071, 'learning_rate': 7.719115734720417e-05, 'epoch': 1.82}
4%|▎ | 280/7700 [02:38<53:37, 2.31it/s] 4%|▎ | 281/7700 [02:38<51:36, 2.40it/s] 4%|▎ | 282/7700 [02:39<49:48, 2.48it/s] 4%|▎ | 283/7700 [02:39<47:32, 2.60it/s] 4%|▎ | 284/7700 [02:39<45:52, 2.69it/s] 4%|▎ | 285/7700 [02:39<42:51, 2.88it/s] 4%|▎ | 286/7700 [02:40<40:44, 3.03it/s] 4%|▎ | 287/7700 [02:40<39:06, 3.16it/s] 4%|▎ | 288/7700 [02:48<5:28:52, 2.66s/it] 4%|▍ | 289/7700 [02:49<4:19:52, 2.10s/it] 4%|▍ | 290/7700 [02:50<3:22:35, 1.64s/it] {'loss': 2.9569, 'learning_rate': 7.708712613784136e-05, 'epoch': 1.88}
4%|▍ | 290/7700 [02:50<3:22:35, 1.64s/it] 4%|▍ | 291/7700 [02:50<2:43:28, 1.32s/it] 4%|▍ | 292/7700 [02:51<2:12:34, 1.07s/it] 4%|▍ | 293/7700 [02:51<1:49:56, 1.12it/s] 4%|▍ | 294/7700 [02:52<1:32:43, 1.33it/s] 4%|▍ | 295/7700 [02:52<1:19:57, 1.54it/s] 4%|▍ | 296/7700 [02:52<1:10:57, 1.74it/s] 4%|▍ | 297/7700 [02:53<1:03:27, 1.94it/s] 4%|▍ | 298/7700 [02:53<58:00, 2.13it/s] 4%|▍ | 299/7700 [02:53<54:08, 2.28it/s] 4%|▍ | 300/7700 [02:54<50:29, 2.44it/s] {'loss': 3.0666, 'learning_rate': 7.698309492847855e-05, 'epoch': 1.95}
4%|▍ | 300/7700 [02:54<50:29, 2.44it/s] 4%|▍ | 301/7700 [02:54<48:13, 2.56it/s] 4%|▍ | 302/7700 [02:54<45:27, 2.71it/s] 4%|▍ | 303/7700 [02:55<42:49, 2.88it/s] 4%|▍ | 304/7700 [02:55<40:27, 3.05it/s] 4%|▍ | 305/7700 [02:56<54:03, 2.28it/s] 4%|▍ | 306/7700 [02:56<53:08, 2.32it/s] 4%|▍ | 307/7700 [02:56<50:21, 2.45it/s] 4%|▍ | 308/7700 [02:57<46:20, 2.66it/s] 4%|▍ | 309/7700 [03:01<3:09:22, 1.54s/it] 4%|▍ | 310/7700 [03:02<2:39:57, 1.30s/it] {'loss': 2.7881, 'learning_rate': 7.687906371911574e-05, 'epoch': 2.01}
4%|▍ | 310/7700 [03:02<2:39:57, 1.30s/it] 4%|▍ | 311/7700 [03:02<2:13:37, 1.09s/it] 4%|▍ | 312/7700 [03:03<1:53:43, 1.08it/s] 4%|▍ | 313/7700 [03:03<1:37:55, 1.26it/s] 4%|▍ | 314/7700 [03:04<1:24:17, 1.46it/s] 4%|▍ | 315/7700 [03:04<1:13:54, 1.67it/s] 4%|▍ | 316/7700 [03:05<1:05:57, 1.87it/s] 4%|▍ | 317/7700 [03:05<1:00:25, 2.04it/s] 4%|▍ | 318/7700 [03:05<55:57, 2.20it/s] 4%|▍ | 319/7700 [03:06<52:32, 2.34it/s] 4%|▍ | 320/7700 [03:06<49:20, 2.49it/s] {'loss': 3.1222, 'learning_rate': 7.677503250975293e-05, 'epoch': 2.08}
4%|▍ | 320/7700 [03:06<49:20, 2.49it/s] 4%|▍ | 321/7700 [03:06<47:08, 2.61it/s] 4%|▍ | 322/7700 [03:07<44:37, 2.76it/s] 4%|▍ | 323/7700 [03:07<41:54, 2.93it/s] 4%|▍ | 324/7700 [03:07<39:58, 3.08it/s] 4%|▍ | 325/7700 [03:08<57:15, 2.15it/s] 4%|▍ | 326/7700 [03:09<1:11:38, 1.72it/s] 4%|▍ | 327/7700 [03:10<1:15:28, 1.63it/s] 4%|▍ | 328/7700 [03:10<1:12:53, 1.69it/s] 4%|▍ | 329/7700 [03:11<1:10:17, 1.75it/s] 4%|▍ | 330/7700 [03:11<1:06:12, 1.86it/s] {'loss': 2.9034, 'learning_rate': 7.667100130039013e-05, 'epoch': 2.14}
4%|▍ | 330/7700 [03:11<1:06:12, 1.86it/s] 4%|▍ | 331/7700 [03:12<1:01:46, 1.99it/s] 4%|▍ | 332/7700 [03:12<58:06, 2.11it/s] 4%|▍ | 333/7700 [03:12<55:36, 2.21it/s] 4%|▍ | 334/7700 [03:13<53:35, 2.29it/s] 4%|▍ | 335/7700 [03:13<50:57, 2.41it/s] 4%|▍ | 336/7700 [03:14<49:32, 2.48it/s] 4%|▍ | 337/7700 [03:14<47:24, 2.59it/s] 4%|▍ | 338/7700 [03:14<45:52, 2.67it/s] 4%|▍ | 339/7700 [03:15<44:00, 2.79it/s] 4%|▍ | 340/7700 [03:15<41:52, 2.93it/s] {'loss': 2.8949, 'learning_rate': 7.656697009102731e-05, 'epoch': 2.21}
4%|▍ | 340/7700 [03:15<41:52, 2.93it/s] 4%|▍ | 341/7700 [03:15<39:57, 3.07it/s] 4%|▍ | 342/7700 [03:16<1:06:53, 1.83it/s] 4%|▍ | 343/7700 [03:17<1:16:59, 1.59it/s] 4%|▍ | 344/7700 [03:18<1:19:19, 1.55it/s] 4%|▍ | 345/7700 [03:18<1:17:45, 1.58it/s] 4%|▍ | 346/7700 [03:19<1:12:42, 1.69it/s] 5%|▍ | 347/7700 [03:19<1:08:09, 1.80it/s] 5%|▍ | 348/7700 [03:20<1:03:40, 1.92it/s] 5%|▍ | 349/7700 [03:20<59:23, 2.06it/s] 5%|▍ | 350/7700 [03:21<55:39, 2.20it/s] {'loss': 3.0846, 'learning_rate': 7.64629388816645e-05, 'epoch': 2.27}
5%|▍ | 350/7700 [03:21<55:39, 2.20it/s] 5%|▍ | 351/7700 [03:21<53:01, 2.31it/s] 5%|▍ | 352/7700 [03:21<50:26, 2.43it/s] 5%|▍ | 353/7700 [03:22<48:36, 2.52it/s] 5%|▍ | 354/7700 [03:22<46:34, 2.63it/s] 5%|▍ | 355/7700 [03:22<45:10, 2.71it/s] 5%|▍ | 356/7700 [03:23<42:16, 2.90it/s] 5%|▍ | 357/7700 [03:23<40:14, 3.04it/s] 5%|▍ | 358/7700 [03:23<38:40, 3.16it/s]