sm_100 not defined for option gpu-name when running calibration in DeepSeek #144

imenselmi · 2025-03-04T11:08:49Z

Software :
NVIDIA Driver Version: 570.36
CUDA Version: 12.8
PyTorch Version: 2.7.0.dev20250302+cu128
Triton Version: 3.2.0
GPU Model: NVIDIA B200 (multiple GPUs)
CUDA Compiler (nvcc): 12.8.61
Python Version: 3.12

I followed the instructions in the DeepSeek README to quantize Deepseek R1 to FP4. The command I ran was:

python inference/convert.py --hf-ckpt-path $HF_FP8_CKPT --save-path $DS_CKPT --n-experts 256 --model-parallel 4

However, when I attempted to run the calibration scripts, I encountered the following error related to the B200 architecture (sm_100):

CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc-per-node 4 --master_port=12346 ptq.py --model_path $DS_CKPT --config configs/config_671B.json --quant_cfg NVFP4_DEFAULT_CFG --output_path $FP4_QUANT_PATH

Error :

libibverbs: Warning: couldn't load driver 'libvmw_pvrdma-rdmav34.so': libvmw_pvrdma-rdmav34.so: cannot open shared object file: No such file or directory
[rank1]:              ^^^^^^^^^^^^^
[rank1]:   File "/home/user/pytorch_env/lib/python3.12/site-packages/triton/compiler/compiler.py", line 279, in compile
[rank1]:     next_module = compile_ir(module, metadata)
[rank1]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/user/pytorch_env/lib/python3.12/site-packages/triton/backends/nvidia/compiler.py", line 389, in <lambda>
[rank1]:     stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
[rank1]:                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]:   File "/home/user/pytorch_env/lib/python3.12/site-packages/triton/backends/nvidia/compiler.py", line 374, in make_cubin
[rank1]:     raise RuntimeError(f'{error}\n'
[rank1]: RuntimeError: Internal Triton PTX codegen error
[rank1]: `ptxas` stderr:
[rank1]: ptxas fatal   : Value 'sm_100' is not defined for option 'gpu-name'

Is there any solution for that ?

The text was updated successfully, but these errors were encountered:

meenchen · 2025-03-04T16:54:12Z

That's because the DeepSeek inference code requires triton support. Could you try these steps to enable Blackwell triton support? https://github.com/triton-lang/triton?tab=readme-ov-file#enabling-blackwell-support

imenselmi mentioned this issue Mar 4, 2025

Inquiry on 0.18 Release plan and R1 TRT engine support in different branch NVIDIA/TensorRT-LLM#2844

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sm_100 not defined for option gpu-name when running calibration in DeepSeek #144

sm_100 not defined for option gpu-name when running calibration in DeepSeek #144

imenselmi commented Mar 4, 2025 •

edited

Loading

meenchen commented Mar 4, 2025

sm_100 not defined for option gpu-name when running calibration in DeepSeek #144

sm_100 not defined for option gpu-name when running calibration in DeepSeek #144

Comments

imenselmi commented Mar 4, 2025 • edited Loading

meenchen commented Mar 4, 2025

imenselmi commented Mar 4, 2025 •

edited

Loading