We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Software : NVIDIA Driver Version: 570.36 CUDA Version: 12.8 PyTorch Version: 2.7.0.dev20250302+cu128 Triton Version: 3.2.0 GPU Model: NVIDIA B200 (multiple GPUs) CUDA Compiler (nvcc): 12.8.61 Python Version: 3.12
I followed the instructions in the DeepSeek README to quantize Deepseek R1 to FP4. The command I ran was:
python inference/convert.py --hf-ckpt-path $HF_FP8_CKPT --save-path $DS_CKPT --n-experts 256 --model-parallel 4
However, when I attempted to run the calibration scripts, I encountered the following error related to the B200 architecture (sm_100):
CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc-per-node 4 --master_port=12346 ptq.py --model_path $DS_CKPT --config configs/config_671B.json --quant_cfg NVFP4_DEFAULT_CFG --output_path $FP4_QUANT_PATH
Error :
libibverbs: Warning: couldn't load driver 'libvmw_pvrdma-rdmav34.so': libvmw_pvrdma-rdmav34.so: cannot open shared object file: No such file or directory [rank1]: ^^^^^^^^^^^^^ [rank1]: File "/home/user/pytorch_env/lib/python3.12/site-packages/triton/compiler/compiler.py", line 279, in compile [rank1]: next_module = compile_ir(module, metadata) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/home/user/pytorch_env/lib/python3.12/site-packages/triton/backends/nvidia/compiler.py", line 389, in <lambda> [rank1]: stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability) [rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank1]: File "/home/user/pytorch_env/lib/python3.12/site-packages/triton/backends/nvidia/compiler.py", line 374, in make_cubin [rank1]: raise RuntimeError(f'{error}\n' [rank1]: RuntimeError: Internal Triton PTX codegen error [rank1]: `ptxas` stderr: [rank1]: ptxas fatal : Value 'sm_100' is not defined for option 'gpu-name'
Is there any solution for that ?
The text was updated successfully, but these errors were encountered:
That's because the DeepSeek inference code requires triton support. Could you try these steps to enable Blackwell triton support? https://github.com/triton-lang/triton?tab=readme-ov-file#enabling-blackwell-support
Sorry, something went wrong.
No branches or pull requests
Software :
NVIDIA Driver Version: 570.36
CUDA Version: 12.8
PyTorch Version: 2.7.0.dev20250302+cu128
Triton Version: 3.2.0
GPU Model: NVIDIA B200 (multiple GPUs)
CUDA Compiler (nvcc): 12.8.61
Python Version: 3.12
I followed the instructions in the DeepSeek README to quantize Deepseek R1 to FP4. The command I ran was:
python inference/convert.py --hf-ckpt-path $HF_FP8_CKPT --save-path $DS_CKPT --n-experts 256 --model-parallel 4
However, when I attempted to run the calibration scripts, I encountered the following error related to the B200 architecture (sm_100):
CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc-per-node 4 --master_port=12346 ptq.py --model_path $DS_CKPT --config configs/config_671B.json --quant_cfg NVFP4_DEFAULT_CFG --output_path $FP4_QUANT_PATH
Error :
Is there any solution for that ?
The text was updated successfully, but these errors were encountered: