Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CalledProcessError in Auto3DSeg "Hello World" #6821

Closed
KumoLiu opened this issue Aug 4, 2023 Discussed in #6785 · 8 comments
Closed

CalledProcessError in Auto3DSeg "Hello World" #6821

KumoLiu opened this issue Aug 4, 2023 Discussed in #6785 · 8 comments
Labels
bug Something isn't working

Comments

@KumoLiu
Copy link
Contributor

KumoLiu commented Aug 4, 2023

Discussed in #6785

Originally posted by keesh0 July 27, 2023
When modified https://github.com/Project-MONAI/tutorials/blob/main/auto3dseg/notebooks/auto3dseg_hello_world.ipynb
to use my own CT data, got error.

569     retcode = process.poll()
   570     if check and retcode:
--> 571         raise CalledProcessError(retcode, process.args,
   572                                  output=stdout, stderr=stderr)
   573 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['python', 'Y:\\data\\GMI\\3d_chewy_bars\\CT_training_images\\workdir\\dints_0\\scripts\\train.py', 'run', "--config_file='Y:/data/GMI/3d_chewy_bars/CT_training_images/workdir/dints_0/configs/hyper_parameters.yaml','Y:/data/GMI/3d_chewy_bars/CT_training_images/workdir/dints_0/configs/hyper_parameters_search.yaml','Y:/data/GMI/3d_chewy_bars/CT_training_images/workdir/dints_0/configs/network.yaml','Y:/data/GMI/3d_chewy_bars/CT_training_images/workdir/dints_0/configs/network_search.yaml','Y:/data/GMI/3d_chewy_bars/CT_training_images/workdir/dints_0/configs/transforms_infer.yaml','Y:/data/GMI/3d_chewy_bars/CT_training_images/workdir/dints_0/configs/transforms_train.yaml','Y:/data/GMI/3d_chewy_bars/CT_training_images/workdir/dints_0/configs/transforms_validate.yaml'", '--training#num_epochs_per_validation=1', '--training#num_images_per_batch=1', '--training#num_epochs=2']' returned non-zero exit status 1.

Also seens #6785 (comment)

cc @mingxin-zheng @wyli

@wyli wyli added the bug Something isn't working label Aug 4, 2023
@dzenanz
Copy link
Contributor

dzenanz commented Nov 1, 2023

I run into this issue without any modifications to the tutorial:
https://github.com/Project-MONAI/tutorials/blob/734d4091af40e23c79400eafde5ad502ee7f2794/auto3dseg/notebooks/auto3dseg_hello_world.ipynb

MONAI version: 1.3.0
Numpy version: 1.24.1
Pytorch version: 2.0.1+cu117
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: 865972f7a791bf7b42efbcd87c8402bd865b329e
MONAI __file__: C:\Dev\[redacted]\.venv\Lib\site-packages\monai\__init__.py

Optional dependencies:
Pytorch Ignite version: NOT INSTALLED or UNKNOWN VERSION.
ITK version: 5.3.0
Nibabel version: 5.1.0
scikit-image version: NOT INSTALLED or UNKNOWN VERSION.
scipy version: 1.11.2
Pillow version: 9.3.0
Tensorboard version: 2.14.0
gdown version: NOT INSTALLED or UNKNOWN VERSION.
TorchVision version: 0.15.2+cu117
tqdm version: 4.66.1
lmdb version: NOT INSTALLED or UNKNOWN VERSION.
psutil version: 5.9.5
pandas version: 2.1.0
einops version: NOT INSTALLED or UNKNOWN VERSION.
transformers version: NOT INSTALLED or UNKNOWN VERSION.
mlflow version: NOT INSTALLED or UNKNOWN VERSION.
pynrrd version: NOT INSTALLED or UNKNOWN VERSION.
clearml version: NOT INSTALLED or UNKNOWN VERSION.

Also the inner problem being:

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.1.1\plugins\python-ce\helpers\pydev\pydevd.py", line 1500, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2021.1.1\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:\Dev\Rivanna\Kitware\python\notebooks\auto3Dseg_dir\dints_0\scripts\train.py", line 54, in <module>
    _libcudart = ctypes.CDLL("libcudart.so")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\ctypes\__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: Could not find module 'libcudart.so' (or one of its dependencies). Try using the full path with constructor syntax.

@dzenanz
Copy link
Contributor

dzenanz commented Nov 1, 2023

If I add algos = "segresnet", I get:

CalledProcessError: Command '['python', 'C:\\Dev\\[redacted]\\helloworld_work_dir\\segresnet_0\\scripts\\train.py', 'run', "--config_file='C:/Dev/[redacted]/helloworld_work_dir/segresnet_0/configs/hyper_parameters.yaml'", '--num_epochs_per_validation=1', '--num_images_per_batch=2', '--num_epochs=2', '--num_warmup_epochs=1']' returned non-zero exit status 1.

But when I run this command under debugger, it finishes without error.

@dzenanz
Copy link
Contributor

dzenanz commented Nov 1, 2023

Adding %env MONAI_DEBUG=1 at the beginning of the notebook provides additional info:

CalledProcessError                        Traceback (most recent call last)
File C:\Dev\[redacted]\python\.venv\Lib\site-packages\monai\utils\misc.py:874, in run_cmd(cmd_list, **kwargs)
    873 try:
--> 874     return subprocess.run(cmd_list, **kwargs)
    875 except subprocess.CalledProcessError as e:

File C:\Program Files\Python311\Lib\subprocess.py:569, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    568     if check and retcode:
--> 569         raise CalledProcessError(retcode, process.args,
    570                                  output=stdout, stderr=stderr)
    571 return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['python', 'C:\\Dev\\[redacted]\\python\\notebooks\\helloworld_work_dir\\segresnet_0\\scripts\\train.py', 'run', "--config_file='C:/Dev/Rivanna/Kitware/python/notebooks/helloworld_work_dir/segresnet_0/configs/hyper_parameters.yaml'", '--num_epochs_per_validation=1', '--num_images_per_batch=2', '--num_epochs=20', '--num_warmup_epochs=1']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[18], line 1
----> 1 runner.run()

File C:\Dev\[redacted]\python\.venv\Lib\site-packages\monai\apps\auto3dseg\auto_runner.py:806, in AutoRunner.run(self)
    804 if len(history) > 0:
    805     if not self.hpo:
--> 806         self._train_algo_in_sequence(history)
    807     else:
    808         self._train_algo_in_nni(history)

File C:\Dev\[redacted]\python\.venv\Lib\site-packages\monai\apps\auto3dseg\auto_runner.py:658, in AutoRunner._train_algo_in_sequence(self, history)
    656 algo = algo_dict[AlgoKeys.ALGO]
    657 if has_option(algo.train, "device_setting"):
--> 658     algo.train(self.train_params, self.device_setting)
    659 else:
    660     algo.train(self.train_params)

File C:\Dev\[redacted]\python\.venv\Lib\site-packages\monai\apps\auto3dseg\bundle_gen.py:278, in BundleAlgo.train(self, train_params, device_setting)
    275     train_params.pop("CUDA_VISIBLE_DEVICES")
    277 cmd, _unused_return = self._create_cmd(train_params)
--> 278 return self._run_cmd(cmd)

File C:\Dev\[redacted]\python\.venv\Lib\site-packages\monai\apps\auto3dseg\bundle_gen.py:255, in BundleAlgo._run_cmd(self, cmd, devices_info)
    251     return _run_cmd_torchrun(
    252         cmd, nnodes=1, nproc_per_node=self.device_setting["n_devices"], env=ps_environ, check=True
    253     )
    254 else:
--> 255     return run_cmd(cmd.split(), run_cmd_verbose=True, env=ps_environ, check=True)

File C:\Dev\[redacted]\python\.venv\Lib\site-packages\monai\utils\misc.py:880, in run_cmd(cmd_list, **kwargs)
    878 output = str(e.stdout.decode(errors="replace"))
    879 errors = str(e.stderr.decode(errors="replace"))
--> 880 raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}.") from e

RuntimeError: subprocess call error 1: Traceback (most recent call last):
  File "C:\Dev\[redacted]\python\notebooks\helloworld_work_dir\segresnet_0\scripts\train.py", line 14, in <module>
    import fire
ModuleNotFoundError: No module named 'fire'

It looks like the generated script does not inherit the notebook's virtual environment, at least not fully.

@dzenanz
Copy link
Contributor

dzenanz commented Nov 1, 2023

Adding print(f"sys.executable: {sys.executable}") in the generated train.py file confirms my suspicion. Notebook's sys.executable is C:\Dev\[redacted]\python\.venv\Scripts\python.exe, while train.py's is C:\Program Files\Python311\python.exe.

@dzenanz
Copy link
Contributor

dzenanz commented Nov 1, 2023

If I install fire into the system python instance, train.py runs into: ModuleNotFoundError: No module named 'tensorboard'.

@mingxin-zheng
Copy link
Contributor

Hi @dzenanz , the current version of Auto3DSeg algorithm use a Linux library and can't be used on Windows at this point.

@vikashg vikashg closed this as completed Dec 20, 2023
@KumoLiu KumoLiu reopened this Dec 27, 2023
@mingxin-zheng
Copy link
Contributor

It seems there was miscommunication. Talked to @KumoLiu and we can reopen when there is a feature request to support Auto3DSeg on Windows platform. Now closing this ticket.

@dzenanz
Copy link
Contributor

dzenanz commented Dec 27, 2023

I would very much like Auto3DSeg to work on Windows. Should I open a formal feature request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants