You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wonder how you conduct the DAE pretraining experiments showned in Table 3. (I only list en-fr results below)
Table 3. The BLEU score comparisons between MASS and other pre-training methods.
Method
en-fr
fr-en
BERT+LM
33.4
32.3
DAE
30.1
28.3
MASS
37.5
34.9
As you described in your paper,
"The second baseline is DAE, which simply uses denoising auto-encoder to pretrain the encoder and decoder."
So my questions are:
Are you simply use the pair (c(x), x) from two language to pre-train the seq2seq model? That is the same DAE as in DAE loss + back-translation for finetuning of XLM.
Are there any tricks to make it work?
I have done experiments on using DAE to pre-train the seq2seq model, but when I continue to train with (only) BT [1], I only get the following BLEU scores which is far less than your reported ones, so I wonder if I misunderstand some details.
Table. My run
Method
en-fr
fr-en
DAE
11.20
10.68
Please help me out here, thanks a lot.
Footnote.
[1] One difference from my training setting and yours is that: during fine-tuning, I only use bt loss instead of both DAE and BT.
The text was updated successfully, but these errors were encountered:
@Epsilon-Lee We conduct this ablation study by using this code. Using DAE in a pure-shared model will lead to identity mapping, you can try the older unsupervised nmt framework, which supports some modules unshared.
Hi, dear authors:
I wonder how you conduct the DAE pretraining experiments showned in Table 3. (I only list en-fr results below)
Table 3. The BLEU score comparisons between MASS and other pre-training methods.
As you described in your paper,
So my questions are:
I have done experiments on using DAE to pre-train the seq2seq model, but when I continue to train with (only) BT [1], I only get the following BLEU scores which is far less than your reported ones, so I wonder if I misunderstand some details.
Table. My run
Please help me out here, thanks a lot.
Footnote.
[1] One difference from my training setting and yours is that: during fine-tuning, I only use bt loss instead of both DAE and BT.
The text was updated successfully, but these errors were encountered: