when bert_src is bert-base-multilingual-cased and bert_tgt is bert-base-chinese，The system cannot be trained #7

zhaofuchen · 2020-07-02T11:15:48Z

My configuration information is as follows:
preprocessing.yml

train_src: work-en-zh-1/train.tok.srcmt
train_tgt: work-en-zh-1/train.tok.pe
valid_src: work-en-zh-1/dev.tok.srcmt
valid_tgt: work-en-zh-1/dev.tok.pe
save_data: prep-data
src_vocab_size: 200000
tgt_vocab_size: 200000
shard_size: 100000
bert_src: bert-base-multilingual-cased
bert_tgt: bert-base-chinese
src_seq_length: 200
tgt_seq_length: 100

train-config.yml

save_model: ape-model
data: prep-data
train_steps: 50000
start_decay_steps: 50000
valid_steps: 1000
save_checkpoint_steps: 10000
keep_checkpoint: 1
rnn_size: 768
word_vec_size: 768
transformer_ff: 3072
heads: 12
layers: 12
position_encoding: 'true'
share_embeddings: 'true'
share_decoder_embeddings: 'true'
encoder_type: bert
enc_bert_type: bert-base-multilingual-cased
decoder_type: bert
dec_bert_type: bert-base-chinese
bert_decoder_token_type: B
bert_decoder_init_context: 'true'
share_self_attn: 'true'
dropout: 0.1
label_smoothing: 0.1
optim: bertadam
learning_rate: 0.00005
warmup_steps: 5000
batch_type: tokens
normalization: tokens
accum_count: 2
batch_size: 512
max_grad_norm: 0
param_init: 0
param_init_glorot: 'true'
valid_batch_size: 8
average_decay: 0.0001
seed: 42
world_size: 1
gpu_ranks: 0

The system fails to train and prompts the following error

[2020-07-02 18:07:13,048 INFO] Building & saving training data...
[2020-07-02 18:07:13,049 INFO] Reading source and target files: work-en-zh-1/train.tok.srcmt work-en-zh-1/train.tok.pe.
[2020-07-02 18:07:13,054 INFO] Building shard 0.
[2020-07-02 18:07:17,401 INFO] * saving 0th train data shard to prep-data.train.0.pt.
[2020-07-02 18:07:18,624 INFO] Building & saving validation data...
[2020-07-02 18:07:18,625 INFO] Reading source and target files: work-en-zh-1/dev.tok.srcmt work-en-zh-1/dev.tok.pe.
[2020-07-02 18:07:18,625 INFO] Building shard 0.
[2020-07-02 18:07:19,202 INFO] * saving 0th valid data shard to prep-data.valid.0.pt.
[2020-07-02 18:07:19,400 INFO] Building & saving vocabulary...
[2020-07-02 18:07:19,557 INFO] * reloading prep-data.train.0.pt.
[2020-07-02 18:07:19,844 INFO] BERT vocab has 21128 tokens.
[2020-07-02 18:07:19,930 INFO] * tgt vocab size: 21128.
[2020-07-02 18:07:19,944 INFO] BERT vocab has 119547 tokens.

[2020-07-02 19:01:55,260 INFO] encoder: 177262848
[2020-07-02 19:01:55,260 INFO] decoder: 85095560
[2020-07-02 19:01:55,260 INFO] * number of parameters: 262358408
[2020-07-02 19:01:55,271 INFO] Starting training on GPU: [0]
[2020-07-02 19:01:55,271 INFO] Start training loop and validate every 1000 steps...
[2020-07-02 19:01:55,517 INFO] Loading dataset from prep-data.train.0.pt, number of examples: 6997
Traceback (most recent call last):
File "./OpenNMT-APE-master/train.py", line 109, in
main(opt)
File "./OpenNMT-APE-master/train.py", line 39, in main
single_main(opt, 0)
File "/data/nextcloud/dbc2017/files/ape/OpenNMT-APE-master/onmt/train_single.py", line 116, in main
valid_steps=opt.valid_steps)
File "/data/nextcloud/dbc2017/files/ape/OpenNMT-APE-master/onmt/trainer.py", line 209, in train
report_stats)
File "/data/nextcloud/dbc2017/files/ape/OpenNMT-APE-master/onmt/trainer.py", line 330, in _gradient_accumulation
trunc_size=trunc_size)
File "/data/nextcloud/dbc2017/files/ape/OpenNMT-APE-master/onmt/utils/loss.py", line 158, in call
loss, stats = self._compute_loss(batch, **shard)
File "/data/nextcloud/dbc2017/files/ape/OpenNMT-APE-master/onmt/utils/loss.py", line 233, in _compute_loss
scores = self.generator(bottled_output)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/nn/functional.py", line 1610, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: The expanded size of the tensor (119547) must match the existing size (21128) at non-singleton dimension 1. Target sizes: [297, 119547]. Tensor sizes: [21128]

Through analysis, we can know that it is because src vocab size and tgt vocab size is not same，As follows:

[2020-07-02 19:26:43,820 INFO] * src vocab size = 119547
[2020-07-02 19:26:43,820 INFO] * tgt vocab size = 21128

Why does this happen？
Has anyone encountered this problem, please help me, thank you very much!

zhaofuchen closed this as completed Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when bert_src is bert-base-multilingual-cased and bert_tgt is bert-base-chinese，The system cannot be trained #7

when bert_src is bert-base-multilingual-cased and bert_tgt is bert-base-chinese，The system cannot be trained #7

zhaofuchen commented Jul 2, 2020 •

edited

Loading

when bert_src is bert-base-multilingual-cased and bert_tgt is bert-base-chinese，The system cannot be trained #7

when bert_src is bert-base-multilingual-cased and bert_tgt is bert-base-chinese，The system cannot be trained #7

Comments

zhaofuchen commented Jul 2, 2020 • edited Loading

zhaofuchen commented Jul 2, 2020 •

edited

Loading