Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while running modelopt on Jetson Orin #124

Open
dudeperf3ct opened this issue Jan 31, 2025 · 1 comment
Open

Issue while running modelopt on Jetson Orin #124

dudeperf3ct opened this issue Jan 31, 2025 · 1 comment

Comments

@dudeperf3ct
Copy link

We are running nvcr.io/nvidia/pytorch:24.06-py3-igpu PyTorch NGC container release notes that ships with modelopt version 0.11.2.

I am unable to import modelopt.torch module. Attaching the error trace

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'2.4.0a0+3bcc3cddb5.nv24.07'
>>> import modelopt
>>> modelopt.__version__
'0.11.2'
>>> import modelopt.torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/__init__.py", line 13, in <module>
    from . import opt, quantization, sparsity, utils  # noqa: E402
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/opt/__init__.py", line 26, in <module>
    from . import _hooks
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/opt/_hooks.py", line 20, in <module>
    from .dynamic import DynamicModule
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/opt/dynamic.py", line 39, in <module>
    from modelopt.torch.utils import is_channels_last, unwrap_model
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/utils/__init__.py", line 18, in <module>
    from .network import *
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/utils/network.py", line 18, in <module>
    import torch.distributed.fsdp
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/__init__.py", line 1, in <module>
    from ._flat_param import FlatParameter as FlatParameter
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/_flat_param.py", line 30, in <module>
    from torch.distributed.fsdp._common_utils import (
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/_common_utils.py", line 31, in <module>
    from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import (
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 9, in <module>
    from torch.distributed.utils import _pack_kwargs, _replace_by_prefix, _unpack_kwargs
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/utils.py", line 283, in <module>
    logger: Optional[dist.Logger] = None,
AttributeError: module 'torch.distributed' has no attribute 'Logger'

Do you have any ideas on how to resolve this error?

My thinking was this is most likely resolves if we upgrade from torch 2.4 to 2.5 as this line is changed here as part of 2.5 release version.

I also tried to manually add that specific fix using newer version of modelopt 0.23.0 but it complained with a different error. Maybe using the official NGC container pytorch:24.08 and above fixes this issue?

>>> modelopt.__version__
'0.23.0'
>>> import modelopt.torch
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/__init__.py", line 24, in <module>
    from . import distill, nas, opt, prune, quantization, sparsity, speculative, utils  # noqa: E402
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/distill/__init__.py", line 18, in <module>
    from . import mode
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/distill/mode.py", line 27, in <module>
    from modelopt.torch.opt.config import ModeloptBaseConfig
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/opt/__init__.py", line 39, in <module>
    from . import plugins, utils
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/opt/plugins/__init__.py", line 18, in <module>
    from modelopt.torch.utils import import_plugin
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/utils/__init__.py", line 25, in <module>
    from .network import *
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/utils/network.py", line 24, in <module>
    import torch.distributed.fsdp
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/__init__.py", line 1, in <module>
    from ._flat_param import FlatParameter as FlatParameter
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/fsdp/_flat_param.py", line 44, in <module>
    from torch.testing._internal.distributed.fake_pg import FakeProcessGroup
  File "/usr/local/lib/python3.10/dist-packages/torch/testing/_internal/distributed/fake_pg.py", line 5, in <module>
    from torch._C._distributed_c10d import (
ModuleNotFoundError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/modelopt/torch/__init__.py", line 26, in <module>
    raise ImportError(f"{e}\nPlease install optional ``[torch]`` dependencies.")
ImportError: No module named 'torch._C._distributed_c10d'; 'torch._C' is not a package
Please install optional ``[torch]`` dependencies.
@cjluo-nv
Copy link
Collaborator

cjluo-nv commented Feb 5, 2025

Hi, @dudeperf3ct , thanks for reporting the issue. At this moment, modelopt does not officially support Jetson devices. We recommend running modelopt on a regular nvidia GPU setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants