Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The shape of probs_seq does not match with the shape of the vocabulary Segmentation fault (core dumped) #9

Open
thunder123321 opened this issue Feb 3, 2022 · 6 comments

Comments

@thunder123321
Copy link

[/home/nihao/nihao-users2/yuhao/DSLP/env/ctcdecode/ctcdecode/src/ctc_beam_search_decoder.cpp:32] FATAL: "(probs_seq[i].size()) == (vocabulary.size())" check failed. The shape of probs_seq does not match with the shape of the vocabulary
[/home/nihao/nihao-users2/yuhao/DSLP/env/ctcdecode/ctcdecode/src/ctc_beam_search_decoder.cpp:32] FATAL: "(probs_seq[i].size()) == (vocabulary.size())" check failed. The shape of probs_seq does not match with the shape of the vocabulary
[/home/nihao/nihao-users2/yuhao/DSLP/env/ctcdecode/ctcdecode/src/ctc_beam_search_decoder.cpp:32] FATAL: "(probs_seq[i].size()) == (vocabulary.size())" check failed. The shape of probs_seq does not match with the shape of the vocabulary
Segmentation fault (core dumped)

I have encountered such a problem, I have not modified the original code, may I ask what is the problem

@thunder123321
Copy link
Author

I'm running “CTC with DSLP” code

python3 train.py data-bin/wmt14.en-de_kd --source-lang en --target-lang de --save-dir checkpoints --eval-tokenized-bleu
--keep-interval-updates 5 --save-interval-updates 500 --validate-interval-updates 500 --maximize-best-checkpoint-metric
--eval-bleu-remove-bpe --eval-bleu-print-samples --best-checkpoint-metric bleu --log-format simple --log-interval 100
--eval-bleu --eval-bleu-detok space --keep-last-epochs 5 --keep-best-checkpoints 5 --fixed-validation-seed 7 --ddp-backend=no_c10d
--share-all-embeddings --decoder-learned-pos --encoder-learned-pos --optimizer adam --adam-betas "(0.9,0.98)" --lr 0.0005 \
--lr-scheduler inverse_sqrt --stop-min-lr 1e-09 --warmup-updates 10000 --warmup-init-lr 1e-07 --apply-bert-init --weight-decay 0.01
--fp16 --clip-norm 2.0 --max-update 300000 --task translation_lev --criterion nat_loss --arch nat_ctc_sd --noise full_mask \
--src-upsample-scale 2 --use-ctc-decoder --ctc-beam-size 1 --concat-yhat --concat-dropout 0.0 --label-smoothing 0.0 \
--activation-fn gelu --dropout 0.1 --max-tokens 2048 --update-freq 4

@thunder123321
Copy link
Author

FATAL: "(probs_seq[i].size()) == (vocabulary.size())" check failed. The shape of probs_seq does not match with the shape of the vocabulary

I encountered this problem when running GLAT+CTC+SD and CTC+SD codes.
What does that mean, and I haven't changed the DSLP code. I hope the author can clarify my doubts.

@chenyangh
Copy link
Owner

Hello, @thunder123321

Unfortunately, there is not enough information for me to tell what went wrong in your setup.
My best guess is that the error is related to your ctcdecode installation.

BTW, I just tested a clean clone of the repo with your script, and it works on my side.

@chenyangh
Copy link
Owner

chenyangh commented Feb 16, 2022

Actually, the ctcdecode is only used as a post-processing in the final version, as I only used beam size 1.
I think you can use the --plain-ctc option to avoid using ctcdecode.

However, you need to do some post-process here:

history.append(output_tokens.clone())

You may incorporate this function:

def _ctc_postprocess(tokens):
        hyp = tokens
        # if cfg.task.plain_ctc:
        _toks = hyp.int().tolist()
        _toks = [v for i, v in enumerate(_toks) if i == 0 or v != _toks[i - 1]]
        hyp = hyp.new_tensor([v for v in _toks if v not in extra_symbols_to_ignore])
        return hyp
extra_symbols_to_ignore = []
    if hasattr(tgt_dict, "blank_index"):
        extra_symbols_to_ignore.append(tgt_dict.blank_index)
    if hasattr(tgt_dict, "mask_index"):
        extra_symbols_to_ignore.append(tgt_dict.mask_index)

@thunder123321
Copy link
Author

Hi. @chenyangh Thank you very much for answering my question. I noticed that the two post-processing functions you mentioned appear in generation.py file. Does that mean I just add the -- plain-ctc parameter? When I added the -- plain-ctc parameter in my experiment, I found that the memory footprint was higher. “ctcdecode” is used to reduce memory footprint.

@chenyangh
Copy link
Owner

chenyangh commented Feb 17, 2022

Hi, @thunder123321 --plain-ctc was used to replace the ctcdecode module (as it is much slower even with beam 1). However, having --plain-ctc option will not perform postprocessing during the training. That's was why I suggest the above modifications if you can not get ctcdecode working.

In terms of memory consumption, I am not sure if that is caused by the plain-ctc option. But I did remember that at some point in my development, the model suddenly consumes more RAM per batch. Unfortunately, I haven't identified the reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants