Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransformerEngine build fail with Conda #954

Closed
TeddLi opened this issue Jun 21, 2024 · 5 comments
Closed

TransformerEngine build fail with Conda #954

TeddLi opened this issue Jun 21, 2024 · 5 comments
Labels
build Build system

Comments

@TeddLi
Copy link

TeddLi commented Jun 21, 2024

`Requirement already satisfied: mpmath>=0.19 in /workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages (from sympy->torch->transformer_engine==1.6.0+c81733f) (1.3.0)
Installing collected packages: transformer_engine
Running setup.py develop for transformer_engine
error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [98 lines of output]
    running develop
    /workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
    !!
    
            ********************************************************************************
            Please avoid running ``setup.py`` and ``easy_install``.
            Instead, use pypa/build, pypa/installer or other
            standards-based tools.
    
            See https://github.com/pypa/setuptools/issues/917 for details.
            ********************************************************************************
    
    !!
      easy_install.initialize_options(self)
    /workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
    !!
    
            ********************************************************************************
            Please avoid running ``setup.py`` directly.
            Instead, use pypa/build, pypa/installer or other
            standards-based tools.
    
            See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
            ********************************************************************************
    
    !!
      self.initialize_options()
    running egg_info
    writing transformer_engine.egg-info/PKG-INFO
    writing dependency_links to transformer_engine.egg-info/dependency_links.txt
    writing requirements to transformer_engine.egg-info/requires.txt
    writing top-level names to transformer_engine.egg-info/top_level.txt
    reading manifest file 'transformer_engine.egg-info/SOURCES.txt'
    adding license file 'LICENSE'
    writing manifest file 'transformer_engine.egg-info/SOURCES.txt'
    running build_ext
    CMake Error at cmake/FindCUDNN.cmake:14 (file):
      file failed to open for reading (Not a directory):
    
        /workspace/miniconda3/envs/megatron_neo/include/cudnn.h/cudnn_version.h
    Call Stack (most recent call first):
      CMakeLists.txt:24 (find_package)
    
    
    CMake Error at cmake/FindCUDNN.cmake:19 (find_library):
      Could not find cudnn_LIBRARY using the following names: cudnn, libcudnn.so.
    Call Stack (most recent call first):
      cmake/FindCUDNN.cmake:41 (find_cudnn_library)
      CMakeLists.txt:24 (find_package)
    
    
    -- Configuring incomplete, errors occurred!
    See also "/home/slurm/TransformerEngine/build/cmake/CMakeFiles/CMakeOutput.log".
    Traceback (most recent call last):
      File "/home/slurm/TransformerEngine/setup.py", line 336, in _build_cmake
        subprocess.run(command, cwd=build_dir, check=True)
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/subprocess.py", line 526, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['/usr/bin/cmake', '-S', '/home/slurm/TransformerEngine/transformer_engine', '-B', '/home/slurm/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/workspace/miniconda3/envs/megatron_neo/bin/python', '-DPython_INCLUDE_DIR=/workspace/miniconda3/envs/megatron_neo/include/python3.10', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/home/slurm/TransformerEngine', '-GNinja']' returned non-zero exit status 1.
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<string>", line 2, in <module>
      File "<pip-setuptools-caller>", line 34, in <module>
      File "/home/slurm/TransformerEngine/setup.py", line 617, in <module>
        main()
      File "/home/slurm/TransformerEngine/setup.py", line 602, in main
        setuptools.setup(
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
        return distutils.core.setup(**attrs)
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
        return run_commands(dist)
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
        dist.run_commands()
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
        self.run_command(cmd)
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
        super().run_command(command)
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
        cmd_obj.run()
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/command/develop.py", line 34, in run
        self.install_for_development()
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/command/develop.py", line 111, in install_for_development
        self.run_command('build_ext')
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
        self.distribution.run_command(command)
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
        super().run_command(command)
      File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
        cmd_obj.run()
      File "/home/slurm/TransformerEngine/setup.py", line 368, in run
        ext._build_cmake(
      File "/home/slurm/TransformerEngine/setup.py", line 338, in _build_cmake
        raise RuntimeError(f"Error when running CMake: {e}")
    RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/home/slurm/TransformerEngine/transformer_engine', '-B', '/home/slurm/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/workspace/miniconda3/envs/megatron_neo/bin/python', '-DPython_INCLUDE_DIR=/workspace/miniconda3/envs/megatron_neo/include/python3.10', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/home/slurm/TransformerEngine', '-GNinja']' returned non-zero exit status 1.
    Building CMake extension transformer_engine
    Running command /usr/bin/cmake -S /home/slurm/TransformerEngine/transformer_engine -B /home/slurm/TransformerEngine/build/cmake -DPython_EXECUTABLE=/workspace/miniconda3/envs/megatron_neo/bin/python -DPython_INCLUDE_DIR=/workspace/miniconda3/envs/megatron_neo/include/python3.10 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/slurm/TransformerEngine -GNinja
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

error: subprocess-exited-with-error

× python setup.py develop did not run successfully.
│ exit code: 1
╰─> [98 lines of output]
running develop
/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************

!!
  easy_install.initialize_options(self)
/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
running egg_info
writing transformer_engine.egg-info/PKG-INFO
writing dependency_links to transformer_engine.egg-info/dependency_links.txt
writing requirements to transformer_engine.egg-info/requires.txt
writing top-level names to transformer_engine.egg-info/top_level.txt
reading manifest file 'transformer_engine.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'transformer_engine.egg-info/SOURCES.txt'
running build_ext
CMake Error at cmake/FindCUDNN.cmake:14 (file):
  file failed to open for reading (Not a directory):

    /workspace/miniconda3/envs/megatron_neo/include/cudnn.h/cudnn_version.h
Call Stack (most recent call first):
  CMakeLists.txt:24 (find_package)


CMake Error at cmake/FindCUDNN.cmake:19 (find_library):
  Could not find cudnn_LIBRARY using the following names: cudnn, libcudnn.so.
Call Stack (most recent call first):
  cmake/FindCUDNN.cmake:41 (find_cudnn_library)
  CMakeLists.txt:24 (find_package)


-- Configuring incomplete, errors occurred!
See also "/home/slurm/TransformerEngine/build/cmake/CMakeFiles/CMakeOutput.log".
Traceback (most recent call last):
  File "/home/slurm/TransformerEngine/setup.py", line 336, in _build_cmake
    subprocess.run(command, cwd=build_dir, check=True)
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/cmake', '-S', '/home/slurm/TransformerEngine/transformer_engine', '-B', '/home/slurm/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/workspace/miniconda3/envs/megatron_neo/bin/python', '-DPython_INCLUDE_DIR=/workspace/miniconda3/envs/megatron_neo/include/python3.10', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/home/slurm/TransformerEngine', '-GNinja']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 2, in <module>
  File "<pip-setuptools-caller>", line 34, in <module>
  File "/home/slurm/TransformerEngine/setup.py", line 617, in <module>
    main()
  File "/home/slurm/TransformerEngine/setup.py", line 602, in main
    setuptools.setup(
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
    return distutils.core.setup(**attrs)
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
    return run_commands(dist)
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
    dist.run_commands()
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
    super().run_command(command)
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/command/develop.py", line 34, in run
    self.install_for_development()
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/command/develop.py", line 111, in install_for_development
    self.run_command('build_ext')
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
    self.distribution.run_command(command)
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
    super().run_command(command)
  File "/workspace/miniconda3/envs/megatron_neo/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/slurm/TransformerEngine/setup.py", line 368, in run
    ext._build_cmake(
  File "/home/slurm/TransformerEngine/setup.py", line 338, in _build_cmake
    raise RuntimeError(f"Error when running CMake: {e}")
RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/home/slurm/TransformerEngine/transformer_engine', '-B', '/home/slurm/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/workspace/miniconda3/envs/megatron_neo/bin/python', '-DPython_INCLUDE_DIR=/workspace/miniconda3/envs/megatron_neo/include/python3.10', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/home/slurm/TransformerEngine', '-GNinja']' returned non-zero exit status 1.
Building CMake extension transformer_engine
Running command /usr/bin/cmake -S /home/slurm/TransformerEngine/transformer_engine -B /home/slurm/TransformerEngine/build/cmake -DPython_EXECUTABLE=/workspace/miniconda3/envs/megatron_neo/bin/python -DPython_INCLUDE_DIR=/workspace/miniconda3/envs/megatron_neo/include/python3.10 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/slurm/TransformerEngine -GNinja
[end of output]

`

@TeddLi
Copy link
Author

TeddLi commented Jun 21, 2024

Conda don't have cudnn.h at begining. I manually install after pytorch using conda install -c conda-forge cudnn

@ywb2018
Copy link

ywb2018 commented Jun 22, 2024

Conda don't have cudnn.h at begining. I manually install after pytorch using conda install -c conda-forge cudnn

dose it work?

@TeddLi
Copy link
Author

TeddLi commented Jun 22, 2024

Conda don't have cudnn.h at begining. I manually install after pytorch using conda install -c conda-forge cudnn

dose it work?

I solved it by install cudnn manually sudo apt-get -y install cudnn9-cuda-12

@timmoon10
Copy link
Collaborator

timmoon10 commented Jun 25, 2024

When installing cuDNN, you should make sure CUDNN_PATH is set in the environment so that the TE build system can find it. It's a bit inconsistent what environment variables are checked (PyTorch itself is inconsistent between CUDNN_PATH and CUDNN_ROOT, see #918 (comment)), so I speculate that Conda isn't setting CUDNN_PATH while APT is.

@timmoon10 timmoon10 added the build Build system label Jun 25, 2024
@MaureenZOU
Copy link

MaureenZOU commented Jan 10, 2025

To install only in conda environment:

  1. conda install -c conda-forge cudnn
  2. export CPLUS_INCLUDE_PATH=/home/xueyan/miniconda/envs/cosmos/lib/python3.10/site-packages/nvidia/nvtx//home/xueyan/miniconda/envs/cosmos/lib/python3.10/site-packages/nvidia/cudnn/include:$CPLUS_INCLUDE_PATH
  3. export C_INCLUDE_PATH=/home/xueyan/miniconda/envs/cosmos/lib/python3.10/site-packages/nvidia/nvtx/include:/home/xueyan/miniconda/envs/cosmos/lib/python3.10/site-packages/nvidia/cudnn/include:$C_INCLUDE_PATH
    And it takes 10 mins to compile...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Build system
Projects
None yet
Development

No branches or pull requests

4 participants