[Bug]ImportError: undefined symbol: cuModuleGetFunction when using lmsysorg/sglang:v0.4.1.post7-cu124 #3065

aooxin · 2025-01-23T04:11:37Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

Description:
While using the lmsysorg/sglang:v0.4.1.post7-cu124 Docker image to launch the server, the following error occurred:

Error Log:
Thu Jan 23 11:55:50 2025[1,1]: scheduler.event_loop_overlap()
Thu Jan 23 11:55:50 2025[1,1]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
Thu Jan 23 11:55:50 2025[1,1]: return func(*args, **kwargs)
Thu Jan 23 11:55:50 2025[1,1]: File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 489, in event_loop_overlap
Thu Jan 23 11:55:50 2025[1,1]: batch = self.get_next_batch_to_run()
Thu Jan 23 11:55:50 2025[1,1]: File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 854, in get_next_batch_to_run
Thu Jan 23 11:55:50 2025[1,1]: new_batch = self.get_new_batch_prefill()
Thu Jan 23 11:55:50 2025[1,1]: File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 971, in get_new_batch_prefill
Thu Jan 23 11:55:50 2025[1,1]: new_batch.prepare_for_extend()
Thu Jan 23 11:55:50 2025[1,1]: File "/sgl-workspace/sglang/python/sglang/srt/managers/schedule_batch.py", line 821, in prepare_for_extend
Thu Jan 23 11:55:50 2025[1,1]: write_req_to_token_pool_triton[(bs,)](
Thu Jan 23 11:55:50 2025[1,1]: File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 345, in
Thu Jan 23 11:55:50 2025[1,1]: return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
Thu Jan 23 11:55:50 2025[1,1]: File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 607, in run
Thu Jan 23 11:55:50 2025[1,1]: device = driver.active.get_current_device()
Thu Jan 23 11:55:50 2025[1,1]: File "/usr/local/lib/python3.10/dist-packages/triton/runtime/driver.py", line 23, in getattr
Thu Jan 23 11:55:50 2025[1,1]: self._initialize_obj()
Thu Jan 23 11:55:50 2025[1,1]: File "/usr/local/lib/python3.10/dist-packages/triton/runtime/driver.py", line 20, in _initialize_obj
Thu Jan 23 11:55:50 2025[1,1]: self._obj = self._init_fn()
Thu Jan 23 11:55:50 2025[1,1]: File "/usr/local/lib/python3.10/dist-packages/triton/runtime/driver.py", line 9, in _create_driver
Thu Jan 23 11:55:50 2025[1,1]: return actives0
Thu Jan 23 11:55:50 2025[1,1]: File "/usr/local/lib/python3.10/dist-packages/triton/backends/nvidia/driver.py", line 371, in init
Thu Jan 23 11:55:50 2025[1,1]: self.utils = CudaUtils() # TODO: make static
Thu Jan 23 11:55:50 2025[1,1]: File "/usr/local/lib/python3.10/dist-packages/triton/backends/nvidia/driver.py", line 80, in init
Thu Jan 23 11:55:50 2025[1,1]: mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
Thu Jan 23 11:55:50 2025[1,1]: File "/usr/local/lib/python3.10/dist-packages/triton/backends/nvidia/driver.py", line 62, in compile_module_from_src
Thu Jan 23 11:55:50 2025[1,1]: mod = importlib.util.module_from_spec(spec)
Thu Jan 23 11:55:50 2025[1,1]: File "", line 571, in module_from_spec
Thu Jan 23 11:55:50 2025[1,1]: File "", line 1176, in create_module
Thu Jan 23 11:55:50 2025[1,1]: File "", line 241, in _call_with_frames_removed
Thu Jan 23 11:55:50 2025[1,1]:ImportError: /root/.triton/cache_38806/41ce1f58e0a8aa9865e66b90d58b3307bb64c5a006830e49543444faf56202fc/cuda_utils.so: undefined symbol: cuModuleGetFunction
Thu Jan 23 11:55:50 2025[1,1]:

ImportError: /root/.triton/cache_xxxxxx/41ce1f58e0a8aa9865e66b90d58b3307bb64c5a006830e49543444faf56202fc/cuda_utils.so: undefined symbol: cuModuleGetFunction

Launch Command:
launch_server_command = [
"python3", "-m", "sglang.launch_server",
"--model-path", model_name,
"--tp", str(tp_size),
"--dist-init-addr", dist_init_addr,
"--nnodes", str(nnodes),
"--node-rank", str(rank), # rank is directly used
"--trust-remote-code", "--host", "0.0.0.0", "--port", str(port),
"--enable-torch-compile", "--disable-cuda-graph",
"--torch-compile-max-bs", "96",
"--mem-fraction-static", "0.8"
]

Reproduction

command:
"python3", "-m", "sglang.launch_server",
"--model-path", model_name,
"--tp", str(tp_size),
"--dist-init-addr", dist_init_addr,
"--nnodes", str(nnodes),
"--node-rank", str(rank), # rank is directly used
"--trust-remote-code", "--host", "0.0.0.0", "--port", str(port),
"--enable-torch-compile", "--disable-cuda-graph",
"--torch-compile-max-bs", "96",
"--mem-fraction-static", "0.8"
model:
deepseek_v3 model

Environment

python3 -m sglang.check_env
/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_config.py:345: UserWarning: Valid config keys have changed in V2:

'fields' has been removed
warnings.warn(message, UserWarning)
Python: 3.10.16 (main, Dec 4 2024, 08:53:37) [GCC 9.4.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: CF-NG-HZZ1-O
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.0
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.131
CUDA Driver Version: 535.183.06
PyTorch: 2.5.1+cu124
sglang: 0.4.1.post7
flashinfer: 0.1.6+cu124torch2.4
triton: 3.1.0
transformers: 4.48.0
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.11.11
fastapi: 0.115.6
hf_transfer: 0.1.9
huggingface_hub: 0.27.1
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
packaging: 24.2
psutil: 6.1.1
pydantic: 2.10.5
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.34.0
uvloop: 0.21.0
vllm: 0.6.4.post1
openai: 1.59.8
anthropic: 0.43.1
decord: 0.6.0
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 NIC6 NIC7 NIC8 NIC9 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE NODE NODE NODE PIX SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE NODE NODE PIX NODE SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 NODE NODE NODE PIX NODE NODE SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 NODE NODE PIX NODE NODE NODE SYS SYS SYS SYS 0-47,96-143 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS SYS SYS SYS SYS NODE PIX NODE NODE 48-95,144-191 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS SYS SYS SYS SYS PIX NODE NODE NODE 48-95,144-191 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS SYS SYS SYS SYS NODE NODE NODE PIX 48-95,144-191 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS SYS SYS SYS SYS NODE NODE PIX NODE 48-95,144-191 1 N/A
NIC0 NODE NODE NODE NODE SYS SYS SYS SYS X PIX NODE NODE NODE NODE SYS SYS SYS SYS
NIC1 NODE NODE NODE NODE SYS SYS SYS SYS PIX X NODE NODE NODE NODE SYS SYS SYS SYS
NIC2 NODE NODE NODE PIX SYS SYS SYS SYS NODE NODE X NODE NODE NODE SYS SYS SYS SYS
NIC3 NODE NODE PIX NODE SYS SYS SYS SYS NODE NODE NODE X NODE NODE SYS SYS SYS SYS
NIC4 NODE PIX NODE NODE SYS SYS SYS SYS NODE NODE NODE NODE X NODE SYS SYS SYS SYS
NIC5 PIX NODE NODE NODE SYS SYS SYS SYS NODE NODE NODE NODE NODE X SYS SYS SYS SYS
NIC6 SYS SYS SYS SYS NODE PIX NODE NODE SYS SYS SYS SYS SYS SYS X NODE NODE NODE
NIC7 SYS SYS SYS SYS PIX NODE NODE NODE SYS SYS SYS SYS SYS SYS NODE X NODE NODE
NIC8 SYS SYS SYS SYS NODE NODE NODE PIX SYS SYS SYS SYS SYS SYS NODE NODE X NODE
NIC9 SYS SYS SYS SYS NODE NODE PIX NODE SYS SYS SYS SYS SYS SYS NODE NODE NODE X

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

NIC Legend:

NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
NIC6: mlx5_6
NIC7: mlx5_7
NIC8: mlx5_8
NIC9: mlx5_9

ulimit soft: 1048576

aooxin · 2025-01-23T06:03:59Z

Apologies, I just realized I made a mistake. The correct image version I am using is lmsysorg/sglang:v0.4.1.post7-cu124. The issue remains the same, with the ImportError: undefined symbol: cuModuleGetFunction error occurring. Looking forward to any insights on this!

zhyncs · 2025-01-23T06:16:57Z

Hi @aooxin It works well for me

aooxin · 2025-01-23T06:23:51Z

Hi @aooxin It works well for me

Hello, could you share your environment details? Additionally, I wanted to ask if using a cu121 image for DeepSeek V3 would have any performance impact compared to cu124. Thanks!

zhyncs · 2025-01-23T06:27:40Z

I use the H200. Here is the command I used:

docker pull lmsysorg/sglang:v0.4.1.post7-cu124
docker run -itd --shm-size 32g --gpus all -v $HOME/.cache:/root/.cache --ipc=host --name sglang_test lmsysorg/sglang:v0.4.1.post7-cu124 /bin/bash
docker exec -it sglang_test /bin/bash
python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct

Swipe4057 · 2025-01-23T15:34:39Z

I use the H200. Here is the command I used:

docker pull lmsysorg/sglang:v0.4.1.post7-cu124
docker run -itd --shm-size 32g --gpus all -v $HOME/.cache:/root/.cache --ipc=host --name sglang_test lmsysorg/sglang:v0.4.1.post7-cu124 /bin/bash
docker exec -it sglang_test /bin/bash
python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct

v0.4.1.post7-cu124 v0.4.1.post7-cu124-srt
what is the difference between these images? what does srt mean?

zhaochenyang20 · 2025-01-23T18:11:35Z

I use the H200. Here is the command I used:
docker pull lmsysorg/sglang:v0.4.1.post7-cu124
docker run -itd --shm-size 32g --gpus all -v $HOME/.cache:/root/.cache --ipc=host --name sglang_test lmsysorg/sglang:v0.4.1.post7-cu124 /bin/bash
docker exec -it sglang_test /bin/bash
python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct

v0.4.1.post7-cu124 v0.4.1.post7-cu124-srt what is the difference between these images? what does srt mean?

SRT means SGLang Runtime Engine. I don't know the clear difference @zhyncs

aooxin changed the title ~~[Bug]ImportError: undefined symbol: cuModuleGetFunction when using lmsysorg/sglang:v0.4.1.post6-cu124~~ [Bug]ImportError: undefined symbol: cuModuleGetFunction when using lmsysorg/sglang:v0.4.1.post7-cu124 Jan 23, 2025

zhaochenyang20 self-assigned this Jan 23, 2025

zhaochenyang20 added the help wanted Extra attention is needed label Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]ImportError: undefined symbol: cuModuleGetFunction when using lmsysorg/sglang:v0.4.1.post7-cu124 #3065

[Bug]ImportError: undefined symbol: cuModuleGetFunction when using lmsysorg/sglang:v0.4.1.post7-cu124 #3065

aooxin commented Jan 23, 2025 •

edited

Loading

aooxin commented Jan 23, 2025 •

edited

Loading

zhyncs commented Jan 23, 2025

aooxin commented Jan 23, 2025

zhyncs commented Jan 23, 2025

Swipe4057 commented Jan 23, 2025

zhaochenyang20 commented Jan 23, 2025

[Bug]ImportError: undefined symbol: cuModuleGetFunction when using lmsysorg/sglang:v0.4.1.post7-cu124 #3065

[Bug]ImportError: undefined symbol: cuModuleGetFunction when using lmsysorg/sglang:v0.4.1.post7-cu124 #3065

Comments

aooxin commented Jan 23, 2025 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

aooxin commented Jan 23, 2025 • edited Loading

zhyncs commented Jan 23, 2025

aooxin commented Jan 23, 2025

zhyncs commented Jan 23, 2025

Swipe4057 commented Jan 23, 2025

zhaochenyang20 commented Jan 23, 2025

aooxin commented Jan 23, 2025 •

edited

Loading

aooxin commented Jan 23, 2025 •

edited

Loading