Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sm_100 not defined for option gpu-name when running calibration in DeepSeek #144

Open
imenselmi opened this issue Mar 4, 2025 · 1 comment

Comments

@imenselmi
Copy link

imenselmi commented Mar 4, 2025

Software :
NVIDIA Driver Version: 570.36
CUDA Version: 12.8
PyTorch Version: 2.7.0.dev20250302+cu128
Triton Version: 3.2.0
GPU Model: NVIDIA B200 (multiple GPUs)
CUDA Compiler (nvcc): 12.8.61
Python Version: 3.12

I followed the instructions in the DeepSeek README to quantize Deepseek R1 to FP4. The command I ran was:

python inference/convert.py --hf-ckpt-path $HF_FP8_CKPT --save-path $DS_CKPT --n-experts 256 --model-parallel 4

However, when I attempted to run the calibration scripts, I encountered the following error related to the B200 architecture (sm_100):

CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc-per-node 4 --master_port=12346 ptq.py --model_path $DS_CKPT --config configs/config_671B.json --quant_cfg NVFP4_DEFAULT_CFG --output_path $FP4_QUANT_PATH

Error :

libibverbs: Warning: couldn't load driver 'libvmw_pvrdma-rdmav34.so': libvmw_pvrdma-rdmav34.so: cannot open shared object file: No such file or directory
[rank1]:              ^^^^^^^^^^^^^
[rank1]:   File "/home/user/pytorch_env/lib/python3.12/site-packages/triton/compiler/compiler.py", line 279, in compile
[rank1]:     next_module = compile_ir(module, metadata)
[rank1]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/user/pytorch_env/lib/python3.12/site-packages/triton/backends/nvidia/compiler.py", line 389, in <lambda>
[rank1]:     stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
[rank1]:                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/user/pytorch_env/lib/python3.12/site-packages/triton/backends/nvidia/compiler.py", line 374, in make_cubin
[rank1]:     raise RuntimeError(f'{error}\n'
[rank1]: RuntimeError: Internal Triton PTX codegen error
[rank1]: `ptxas` stderr:
[rank1]: ptxas fatal   : Value 'sm_100' is not defined for option 'gpu-name'

Is there any solution for that ?

@meenchen
Copy link

meenchen commented Mar 4, 2025

That's because the DeepSeek inference code requires triton support. Could you try these steps to enable Blackwell triton support? https://github.com/triton-lang/triton?tab=readme-ov-file#enabling-blackwell-support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants