Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imputer error while running train.py for GLAT with DSLP #17

Open
SylasreeKS opened this issue Dec 8, 2023 · 0 comments
Open

imputer error while running train.py for GLAT with DSLP #17

SylasreeKS opened this issue Dec 8, 2023 · 0 comments

Comments

@SylasreeKS
Copy link

The train command i used:
python3 train.py data-bin/wmt14.en-de_kd --source-lang en --target-lang de --save-dir checkpoints --eval-tokenized-bleu
--keep-interval-updates 5 --save-interval-updates 500 --validate-interval-updates 500 --maximize-best-checkpoint-metric
--eval-bleu-remove-bpe --eval-bleu-print-samples --best-checkpoint-metric bleu --log-format simple --log-interval 100
--eval-bleu --eval-bleu-detok space --keep-last-epochs 5 --keep-best-checkpoints 5 --fixed-validation-seed 7 --ddp-backend=no_c10d
--share-all-embeddings --decoder-learned-pos --encoder-learned-pos --optimizer adam --adam-betas "(0.9,0.98)" --lr 0.0005 \
--lr-scheduler inverse_sqrt --stop-min-lr 1e-09 --warmup-updates 10000 --warmup-init-lr 1e-07 --apply-bert-init --weight-decay 0.01
--fp16 --clip-norm 2.0 --max-update 300000 --task translation_glat --criterion glat_loss --arch glat_sd --noise full_mask \
--concat-yhat --concat-dropout 0.0 --label-smoothing 0.1 \
--activation-fn gelu --dropout 0.1 --max-tokens 8192 --glat-mode glat --length-loss-factor 0.1 --pred-length-offset

The installations are all done using instructions given in the page. While executing, the following error is coming:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/opt/conda/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/kaggle/working/DSLP/train.py", line 10, in
from fairseq_cli.train import cli_main
File "/kaggle/working/DSLP/fairseq_cli/train.py", line 19, in
from fairseq import (
File "/kaggle/working/DSLP/fairseq/init.py", line 30, in
import fairseq.criterions # noqa
File "/kaggle/working/DSLP/fairseq/criterions/init.py", line 36, in
importlib.import_module("fairseq.criterions." + file_name)
File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/kaggle/working/DSLP/fairseq/criterions/ctc.py", line 19, in
from fairseq.tasks import FairseqTask
File "/kaggle/working/DSLP/fairseq/tasks/init.py", line 116, in
module = importlib.import_module("fairseq.tasks." + task_name)
File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module
return bootstrap.gcd_import(name[level:], package, level)
File "/kaggle/working/DSLP/fairseq/tasks/multilingual_translation.py", line 19, in
from fairseq.models import FairseqMultiModel
File "/kaggle/working/DSLP/fairseq/models/init.py", line 208, in
module = importlib.import_module("fairseq.models." + model_name)
File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module
return bootstrap.gcd_import(name[level:], package, level)
File "/kaggle/working/DSLP/fairseq/models/nat/init.py", line 27, in
from .nat_ctc_sd_ss import *
File "/kaggle/working/DSLP/fairseq/models/nat/nat_ctc_sd_ss.py", line 18, in
from fairseq.torch_imputer import best_alignment, imputer_loss
File "/kaggle/working/DSLP/fairseq/torch_imputer/init.py", line 1, in
from .imputer import imputer_loss, ImputerLoss, best_alignment, ctc_decode
File "/kaggle/working/DSLP/fairseq/torch_imputer/imputer.py", line 11, in
imputer = load(
File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return jit_compile(
File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1509, in jit_compile
write_ninja_file_and_build_library(
File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1624, in write_ninja_file_and_build_library
run_ninja_build(
File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'imputer_fn': [1/2] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=imputer_fn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1016" -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS
-D__CUDA_NO_HALF2_OPERATORS
--expt-relaxed-constexpr -gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_60,code=sm_60 --compiler-options '-fPIC' -std=c++17 -c /kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu -o imputer.cuda.o
FAILED: imputer.cuda.o
/usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=imputer_fn -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1016" -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS
-D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_60,code=sm_60 --compiler-options '-fPIC' -std=c++17 -c /kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu -o imputer.cuda.o
/kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu(332): error: identifier "THCudaCheck" is undefined

/kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu(753): error: identifier "THCudaCheck" is undefined

/kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu(817): error: identifier "THCudaCheck" is undefined

/kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu(842): error: identifier "THCudaCheck" is undefined

/kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu(859): error: identifier "THCudaCheck" is undefined

5 errors detected in the compilation of "/kaggle/working/DSLP/fairseq/torch_imputer/imputer.cu".
ninja: build stopped: subcommand failed.

I ran these codes on Google Colab, Kaggle Notebook with the following environment - python 3.10, numpy 1.22.0

Please help me solve this error as I have tried all the possible ways to resolvw this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant