Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: No module named 'mgds' #670

Open
DiegoRRR opened this issue Jan 31, 2025 · 10 comments
Open

[Bug]: No module named 'mgds' #670

DiegoRRR opened this issue Jan 31, 2025 · 10 comments
Labels
bug Something isn't working

Comments

@DiegoRRR
Copy link

What happened?

I am trying to install OneTrainer.
It stops on this error:

D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modules\tran
sformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Tr
iggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:84.)
  device: torch.device = torch.device(torch._C._get_default_device()),  # torch.
device('cpu'),
ERROR  | Uncaught exception | <class 'ModuleNotFoundError'>; No module named 'mg
ds'; <traceback object at 0x0000000004C5A740>;

I tried installing it manually with python -m pip install mgds but I get this error:

ERROR: Could not find a version that satisfies the requirement mgds (from version: none)
No matching distribution found for mgds

What did you expect would happen?

start

Relevant log output

activating venv D:\apps\stable-diffusion\OneTrainer\venv
Using Python "D:\apps\stable-diffusion\OneTrainer\venv\Scripts\python.exe"

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "D:\apps\stable-diffusion\OneTrainer\sc
ripts\train_ui.py", line 5, in <module>
    from modules.ui.TrainUI import TrainUI
  File "D:\apps\stable-diffusion\OneTrainer\modules\ui\TrainUI.py", line 9, in <
module>
    from modules.trainer.CloudTrainer import CloudTrainer
  File "D:\apps\stable-diffusion\OneTrainer\modules\trainer\CloudTrainer.py", li
ne 8, in <module>
    from modules.cloud.LinuxCloud import LinuxCloud
  File "D:\apps\stable-diffusion\OneTrainer\modules\cloud\LinuxCloud.py", line 7
, in <module>
    from modules.cloud.BaseCloud import BaseCloud
  File "D:\apps\stable-diffusion\OneTrainer\modules\cloud\BaseCloud.py", line 5,
 in <module>
    from modules.util.callbacks.TrainCallbacks import TrainCallbacks
  File "D:\apps\stable-diffusion\OneTrainer\modules\util\callbacks\TrainCallback
s.py", line 4, in <module>
    from modules.modelSampler.BaseModelSampler import ModelSamplerOutput
  File "D:\apps\stable-diffusion\OneTrainer\modules\modelSampler\BaseModelSample
r.py", line 13, in <module>
    import torch
  File "D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\__init_
_.py", line 1382, in <module>
    from .functional import *  # noqa: F403
  File "D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\functio
nal.py", line 7, in <module>
    import torch.nn.functional as F
  File "D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\nn\__in
it__.py", line 1, in <module>
    from .modules import *  # noqa: F403
  File "D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modu
les\__init__.py", line 35, in <module>
    from .transformer import TransformerEncoder, TransformerDecoder, \
  File "D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modu
les\transformer.py", line 20, in <module>
    device: torch.device = torch.device(torch._C._get_default_device()),  # torc
h.device('cpu'),
D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modules\tran
sformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Tr
iggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:84.)
  device: torch.device = torch.device(torch._C._get_default_device()),  # torch.
device('cpu'),
ERROR  | Uncaught exception | <class 'ModuleNotFoundError'>; No module named 'mg
ds'; <traceback object at 0x0000000004C5A740>;
Appuyez sur une touche pour continuer...

Output of pip freeze

( I don't know how to do that )

@DiegoRRR DiegoRRR added the bug Something isn't working label Jan 31, 2025
@Arcitec
Copy link
Contributor

Arcitec commented Feb 1, 2025

MGDS is not shipped via pip, so you can't install it that way. It looks like you may have installed this in a weird way since you are missing a core part that gets installed by the automatic scripts.

To resolve this I would recommend that you delete the venv folder and then run update.bat (Windows) or update.sh (Mac/Linux). Then run install.bat or install.sh. That will update OneTrainer and install everything you need. It will 100% solve this.

Edit: It would be helpful to know how you originally installed it to get this broken installation?

@O-J1
Copy link
Collaborator

O-J1 commented Feb 3, 2025

@DiegoRRR Could you please update us on your exact installation steps? If I dont hear from you I will be closing this as without proof I can only assume user error.

@O-J1 O-J1 added the followup Failure to provide config or other info or needs followup label Feb 3, 2025
@DiegoRRR
Copy link
Author

DiegoRRR commented Feb 3, 2025

I just follow the instructions:
git clone https://github.com/Nerogar/OneTrainer.git
then I run install.bat
I just change the version of pytorch to Cuda 118 because my computer can't support Cuda 12.

I deleted the venv folder as suggested.
update.bat tells me to run install.bat first.
install.bat:

Microsoft Windows [version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. Tous droits réservés.

D:\apps\stable-diffusion\OneTrainer>install
creating venv in D:\apps\stable-diffusion\OneTrainer\venv
activating venv D:\apps\stable-diffusion\OneTrainer\venv
installing dependencies
Ignoring scipy: markers 'sys_platform != "win32"' don't match your environment
Obtaining diffusers from git+https://github.com/huggingface/diffusers.git@c944f0
6#egg=diffusers (from -r requirements-global.txt (line 19))
  Cloning https://github.com/huggingface/diffusers.git (to revision c944f06) to
d:\apps\stable-diffusion\onetrainer\venv\src\diffusers
  Running command git clone --filter=blob:none --quiet https://github.com/huggin
gface/diffusers.git 'D:\apps\stable-diffusion\OneTrainer\venv\src\diffusers'
  WARNING: Did not find branch or tag 'c944f06', assuming revision or ref.
  Running command git checkout -q c944f06
  Resolved https://github.com/huggingface/diffusers.git to commit c944f06
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... done
Obtaining mgds from git+https://github.com/Nerogar/mgds.git@fcaec25#egg=mgds (fr
om -r requirements-global.txt (line 30))
  Cloning https://github.com/Nerogar/mgds.git (to revision fcaec25) to d:\apps\s
table-diffusion\onetrainer\venv\src\mgds
  Running command git clone --filter=blob:none --quiet https://github.com/Neroga
r/mgds.git 'D:\apps\stable-diffusion\OneTrainer\venv\src\mgds'
  WARNING: Did not find branch or tag 'fcaec25', assuming revision or ref.
  Running command git checkout -q fcaec25
  Resolved https://github.com/Nerogar/mgds.git to commit fcaec25
  Preparing metadata (setup.py) ... done
Collecting xformers==0.0.28.post3
  Using cached xformers-0.0.28.post3.tar.gz (7.8 MB)
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or d
irectory: 'C:\\Users\\Gros2\\AppData\\Local\\Temp\\pip-install-4b8nbrf1\\xformer
s_9340f6c84c1e443e87494ce02a5e7d25\\third_party/flash-attention/csrc/composable_
kernel/client_example/24_grouped_conv_activation/grouped_convnd_bwd_data_bilinea
r/grouped_conv_bwd_data_bilinear_residual_fp16.cpp'
HINT: This error might have occurred since this system does not have Windows Lon
g Path support enabled. You can find information on how to enable this at https:
//pip.pypa.io/warnings/enable-long-paths

WARNING: There was an error checking the latest version of pip.
checking if CUDA is available
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'torch'
goto était inattendu.

Long paths are enabled though, I checked.

With my venv folder, install.bat :

D:\apps\stable-diffusion\OneTrainer>install
activating venv D:\apps\stable-diffusion\OneTrainer\venv
installing dependencies
Ignoring scipy: markers 'sys_platform != "win32"' don't match your environment
Obtaining diffusers from git+https://github.com/huggingface/diffusers.git@c944f0
6#egg=diffusers (from -r D:\apps\stable-diffusion\OneTrainer\requirements-global
.txt (line 19))
  Updating d:\apps\stable-diffusion\onetrainer\venv\src\diffusers clone (to revi
sion c944f06)
  Running command git fetch -q --tags
  WARNING: Did not find branch or tag 'c944f06', assuming revision or ref.
  Running command git reset --hard -q c944f06
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... done
Obtaining mgds from git+https://github.com/Nerogar/mgds.git@fcaec25#egg=mgds (fr
om -r D:\apps\stable-diffusion\OneTrainer\requirements-global.txt (line 30))
  Updating d:\apps\stable-diffusion\onetrainer\venv\src\mgds clone (to revision
fcaec25)
  Running command git fetch -q --tags
  WARNING: Did not find branch or tag 'fcaec25', assuming revision or ref.
  Running command git reset --hard -q fcaec25
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... done
Collecting xformers==0.0.28.post3 (from -r D:\apps\stable-diffusion\OneTrainer\r
equirements-cuda.txt (line 5))
  Using cached xformers-0.0.28.post3.tar.gz (7.8 MB)
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or d
irectory: 'C:\\Users\\Gros2\\AppData\\Local\\Temp\\pip-install-5_l4360_\\xformer
s_2e8f30c3fb9f4bdfbd9424a17234f876\\third_party/flash-attention/csrc/composable_
kernel/client_example/24_grouped_conv_activation/grouped_convnd_bwd_data_bilinea
r/grouped_conv_bwd_data_bilinear_residual_fp16.cpp'
HINT: This error might have occurred since this system does not have Windows Lon
g Path support enabled. You can find information on how to enable this at https:
//pip.pypa.io/warnings/enable-long-paths

checking if CUDA is available

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.2 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<string>", line 1, in <module>
  File "D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\__init_
_.py", line 1382, in <module>
    from .functional import *  # noqa: F403
  File "D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\functio
nal.py", line 7, in <module>
    import torch.nn.functional as F
  File "D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\nn\__in
it__.py", line 1, in <module>
    from .modules import *  # noqa: F403
  File "D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modu
les\__init__.py", line 35, in <module>
    from .transformer import TransformerEncoder, TransformerDecoder, \
  File "D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modu
les\transformer.py", line 20, in <module>
    device: torch.device = torch.device(torch._C._get_default_device()),  # torc
h.device('cpu'),
D:\apps\stable-diffusion\OneTrainer\venv\lib\site-packages\torch\nn\modules\tran
sformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Tr
iggered internally at ..\torch\csrc\utils\tensor_numpy.cpp:84.)
  device: torch.device = torch.device(torch._C._get_default_device()),  # torch.
device('cpu'),

************
Install done
************
Appuyez sur une touche pour continuer...

@Chadius
Copy link

Chadius commented Feb 4, 2025

On Windows, running the automatic installer fails, as the requirements are now out of date. Here are the changes I had to make to get the right versions installed.

PS C:\Users\there\Documents\OneTrainer> git stash show -p
diff --git a/requirements-cuda.txt b/requirements-cuda.txt
index c8ad93d..81dbd75 100644
--- a/requirements-cuda.txt
+++ b/requirements-cuda.txt
@@ -1,11 +1,11 @@
 # pytorch
 --extra-index-url https://download.pytorch.org/whl/cu124
-torch==2.5.1+cu124
-torchvision==0.20.1+cu124
-onnxruntime-gpu==1.19.2
+torch==2.6.0+cu124
+torchvision==0.21.0+cu124
+onnxruntime-gpu==1.20.1

When the installer runs again, xformers fails to find torch.

Collecting xformers==0.0.28.post3 (from -r C:\Users\there\Documents\OneTrainer\requirements-cuda.txt (line 8))
  Using cached xformers-0.0.28.post3.tar.gz (7.8 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [23 lines of output]
      Traceback (most recent call last):
        File "C:\Users\there\Documents\OneTrainer\venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
          ~~~~^^
        File "C:\Users\there\Documents\OneTrainer\venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\there\Documents\OneTrainer\venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "C:\Users\there\AppData\Local\Temp\pip-build-env-txx9c8nl\overlay\Lib\site-packages\setuptools\build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\there\AppData\Local\Temp\pip-build-env-txx9c8nl\overlay\Lib\site-packages\setuptools\build_meta.py", line 304, in _get_build_requires
          self.run_setup()
          ~~~~~~~~~~~~~~^^
        File "C:\Users\there\AppData\Local\Temp\pip-build-env-txx9c8nl\overlay\Lib\site-packages\setuptools\build_meta.py", line 522, in run_setup
          super().run_setup(setup_script=setup_script)
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\there\AppData\Local\Temp\pip-build-env-txx9c8nl\overlay\Lib\site-packages\setuptools\build_meta.py", line 320, in run_setup
          exec(code, locals())
          ~~~~^^^^^^^^^^^^^^^^
        File "<string>", line 24, in <module>
      ModuleNotFoundError: No module named 'torch'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.

I tried to resolve this by manually installing torch pip install torch --extra-index-url https://download.pytorch.org/whl/cu124 then running the installer again, but it still fails at the same spot.

I use kohya on this machine, so I already have a later version of xformers installed.

At this point, I just manually installed each component in requirements.txt (requirement-default.txt and requirement-cuda.txt files) separately rather than running the install script. I can get all the way to the mgds issue the OP posted. I also had to manually install pip, Visual Studio, and rustup to install Numpy.

This seems to be a known issue with Windows and xformers.

facebookresearch/xformers#740

Even with xformers resolved (or at least skipped), sentencepiece (which the xformers thread also mentions) fails to install. There's a cmake error I don't know enough about to even explain.

Collecting sentencepiece==0.2.0 (from -r C:\Users\there\Documents\OneTrainer\requirements-global.txt (line 21))
  Using cached sentencepiece-0.2.0.tar.gz (2.6 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [32 lines of output]
      Traceback (most recent call last):
        File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 196, in _run_module_as_main
          return _run_code(code, main_globals, None,
        File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 86, in _run_code
          exec(code, run_globals)
        File "C:\Users\there\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\Scripts\cmake.exe\__main__.py", line 4, in <module>
      ModuleNotFoundError: No module named 'cmake'
      Traceback (most recent call last):
        File "C:\Users\there\Documents\OneTrainer\venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
          ~~~~^^
        File "C:\Users\there\Documents\OneTrainer\venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ~~~~^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\there\Documents\OneTrainer\venv\Lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "C:\Users\there\AppData\Local\Temp\pip-build-env-yfzl9o3e\overlay\Lib\site-packages\setuptools\build_meta.py", line 334, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\there\AppData\Local\Temp\pip-build-env-yfzl9o3e\overlay\Lib\site-packages\setuptools\build_meta.py", line 304, in _get_build_requires
          self.run_setup()
          ~~~~~~~~~~~~~~^^
        File "C:\Users\there\AppData\Local\Temp\pip-build-env-yfzl9o3e\overlay\Lib\site-packages\setuptools\build_meta.py", line 522, in run_setup
          super().run_setup(setup_script=setup_script)
          ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "C:\Users\there\AppData\Local\Temp\pip-build-env-yfzl9o3e\overlay\Lib\site-packages\setuptools\build_meta.py", line 320, in run_setup
          exec(code, locals())
          ~~~~^^^^^^^^^^^^^^^^
        File "<string>", line 128, in <module>
        File "C:\Users\there\AppData\Local\Programs\Python\Python313\Lib\subprocess.py", line 419, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', 'sentencepiece', '-A', 'x64', '-B', 'build', '-DSPM_ENABLE_SHARED=OFF', '-DCMAKE_INSTALL_PREFIX=build\\root']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.

Note this is on my Windows machine. I installed this on my Mac with no issues. But Windows has the NVidia video card, so I'd love to use that instead.

@O-J1 O-J1 removed the followup Failure to provide config or other info or needs followup label Feb 7, 2025
@Tr1dae

This comment has been minimized.

@O-J1
Copy link
Collaborator

O-J1 commented Feb 13, 2025

We explicitly do not support 3.13 Tr1dae. Read the repos README next time

@Chadius
Copy link

Chadius commented Feb 13, 2025

Thanks for the hint @Tr1dae . I was able to install Python 3.12 then create a 3.12 environment to work in.

cd C:\Users\there\OneTrainer
virtualenv.exe -p C:\Users\there\AppData\Local\Programs\Python\Python312\python.exe python312
.\python312\Scripts\activate
.\install.bat

My Mac is running Python 3.12.7 so that's probably why it has no issues. Windows is installing right now.

Devs, would it be possible to add a Python version check to the installer so it can halt installation with a more obvious error message? That way people don't bother you because they installed the most recent version of Python.

Either way, I'm all set, feel free to close this thread.

@Tr1dae
Copy link

Tr1dae commented Feb 13, 2025

We explicitly do not support 3.13 Tr1dae. Read the repos README next time

bro - cut the sass. I was helping a error thread that went un-answered. Not like I intentionally went to try running 3.13 expecting it to work .
I was unaware of my 3.13 lurking and wanted to help anyone else googling and ending up here.... but "rEaD tHe ReAdMe"

@O-J1
Copy link
Collaborator

O-J1 commented Feb 14, 2025

I was unaware of my 3.13 lurking and wanted to help anyone else googling and ending up here.... but "rEaD tHe ReAdMe"

I am unsure how you are unaware of Python 3.13 global installation—since global installs are manual, so that doesnt add up. Thank you for helping this user

@Tr1dae
Copy link

Tr1dae commented Feb 14, 2025

clearly I wasn't the only one that forgot about a previous install in a sea of various installs when running multiple different things from all over. Your little passive-aggressive comments aren't helping.

Next time check your tone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants