Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Mistral's Pixtral error for vllm>=0.6.5 on 4 T4's #11865

Open
1 task done
jgen1 opened this issue Jan 8, 2025 · 12 comments
Open
1 task done

[Bug]: Mistral's Pixtral error for vllm>=0.6.5 on 4 T4's #11865

jgen1 opened this issue Jan 8, 2025 · 12 comments
Labels
bug Something isn't working

Comments

@jgen1
Copy link

jgen1 commented Jan 8, 2025

Your current environment

The output of `python collect_env.py` NOT WORKING VERSIONS
Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Wolfi (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.40

Python version: 3.11.11 (tags/v3.11.11-0-gd03b868-dirty:d03b868, Dec  4 2024, 19:55:37) [GCC 14.2.0] (64-bit runtime)
Python platform: Linux-5.10.199-190.747.amzn2.x86_64-x86_64-with-glibc2.40
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
/bin/sh: lscpu: not found

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pyzmq==26.2.0
[pip3] torch==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.47.1
[pip3] triton==3.1.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.6.post1
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

LD_LIBRARY_PATH=/home/nonroot/.local/lib/python3.11/site-packages/cv2/../../lib64:
The output of `python collect_env.py` THE WORKING VERSIONS
WARNING 01-08 19:05:37 _custom_ops.py:20] Failed to import from vllm._C with ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')
Collecting environment information...
PyTorch version: 2.5.1+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: Wolfi (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.40

Python version: 3.11.11 (tags/v3.11.11-0-gd03b868-dirty:d03b868, Dec  4 2024, 19:55:37) [GCC 14.2.0] (64-bit runtime)
Python platform: Linux-5.10.199-190.747.amzn2.x86_64-x86_64-with-glibc2.40
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
/bin/sh: lscpu: not found

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] pyzmq==26.2.0
[pip3] torch==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.47.1
[pip3] triton==3.1.0
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: 0.6.4
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

LD_LIBRARY_PATH=/home/nonroot/.local/lib/python3.11/site-packages/cv2/../../lib64:

Model Input Dumps

err_execute_model_input_20250108-202622.zip

🐛 Describe the bug

I have 4 T4's (on an AWS g4dn.12xl) and I am trying to run Mistral's pixtral model. I have been able to run it no problems for the past couple of months, in docker, with the following parameters:

      VLLM_RPC_TIMEOUT: 30000
        --model pixtral
        --quantization None
        --tensor-parallel-size 4
        --dtype float16
        --config-format mistral
        --tokenizer-mode mistral
        --load-format mistral
        --distributed-executor-backend mp
        --limit-mm-per-prompt image=4
        --max-model-len 35500
        --max-num-batched-tokens 55000
        --gpu-memory-utilization 0.90
        --enforce-eager
        --scheduling-policy priority

It continues to work with vllm 0.6.4. But, in my docker image, when I install vllm 0.6.5, or 0.6.6 or v0.6.6.post1, with the same parameters, I get the error in the attached file (too many characters to add here).

(note: I also have an output with TORCH_LOGS: "+dynamo" and TORCHDYNAMO_VERBOSE: 1 if helpful)

Again, it is important to note that the exact same configuration works with vllm 0.6.4.

Here is a small part of the error, full trace in the attached log:

INFO 01-08 20:56:16 multiproc_worker_utils.py:127] Killing local vLLM worker processes
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1446, in _call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 129, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/__init__.py", line 2234, in __call__
    return compile_fx(model_, inputs_, config_patches=self.config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1521, in compile_fx
    return aot_autograd(
           ^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/backends/common.py", line 72, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1071, in aot_module_simplified
    compiled_fn = dispatch_and_compile()
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1056, in dispatch_and_compile
    compiled_fn, _ = create_aot_dispatcher_function(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 522, in create_aot_dispatcher_function
    return _create_aot_dispatcher_function(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 759, in _create_aot_dispatcher_function
    compiled_fn, fw_metadata = compiler_fn(
                               ^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 179, in aot_dispatch_base
    compiled_fw = compiler(fw_module, updated_flat_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1350, in fw_compiler_base
    return _fw_compiler_base(model, example_inputs, is_inference)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1421, in _fw_compiler_base
    return inner_compile(
           ^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 475, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 85, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 661, in _compile_fx_inner
    compiled_graph = FxGraphCache.load(
                     ^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/codecache.py", line 1334, in load
    compiled_graph = compile_fx_fn(
                     ^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 570, in codegen_and_compile
    compiled_graph = fx_codegen_and_compile(gm, example_inputs, **fx_kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/compile_fx.py", line 878, in fx_codegen_and_compile
    compiled_fn = graph.compile_to_fn()
                  ^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1913, in compile_to_fn
    return self.compile_to_module().call
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1839, in compile_to_module
    return self._compile_to_module()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1845, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
                                                             ^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/graph.py", line 1784, in codegen
    self.scheduler.codegen()
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/scheduler.py", line 3383, in codegen
    return self._codegen()
           ^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/scheduler.py", line 3461, in _codegen
    self.get_backend(device).codegen_node(node)
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/codegen/cuda_combined_scheduling.py", line 80, in codegen_node
    return self._triton_scheduling.codegen_node(node)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/codegen/simd.py", line 1155, in codegen_node
    return self.codegen_node_schedule(node_schedule, buf_accesses, numel, rnumel)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/codegen/simd.py", line 1364, in codegen_node_schedule
    src_code = kernel.codegen_kernel()
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/codegen/triton.py", line 2661, in codegen_kernel
    **self.inductor_meta_common(),
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_inductor/codegen/triton.py", line 2532, in inductor_meta_common
    "backend_hash": torch.utils._triton.triton_hash_with_backend(),
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/utils/_triton.py", line 53, in triton_hash_with_backend
    backend = triton_backend()
              ^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/utils/_triton.py", line 45, in triton_backend
    target = driver.active.get_current_target()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/triton/runtime/driver.py", line 23, in __getattr__
    self._initialize_obj()
  File "/home/nonroot/.local/lib/python3.11/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
                ^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/triton/runtime/driver.py", line 9, in _create_driver
    return actives[0]()
           ^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 371, in __init__
    self.utils = CudaUtils()  # TODO: make static
                 ^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 80, in __init__
    mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
    so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/triton/runtime/build.py", line 32, in _build
    raise RuntimeError("Failed to find C compiler. Please specify via CC environment variable.")
RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1691, in execute_model
    hidden_or_intermediate_states = model_executable(
                                    ^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/model_executor/models/pixtral.py", line 293, in forward
    vision_embeddings = self.get_multimodal_embeddings(**kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/model_executor/models/pixtral.py", line 237, in get_multimodal_embeddings
    image_embeds = self.language_model.get_input_embeddings(image_tokens)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/model_executor/models/llama.py", line 557, in get_input_embeddings
    return self.model.get_input_embeddings(input_ids)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/model_executor/models/llama.py", line 336, in get_input_embeddings
    return self.embed_tokens(input_ids)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/model_executor/layers/vocab_parallel_embedding.py", line 404, in forward
    masked_input, input_mask = get_masked_input_and_mask(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 465, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1269, in __call__
    return self._torchdynamo_orig_callable(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1064, in __call__
    result = self._inner_convert(
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 526, in __call__
    return _compile(
           ^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 924, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 666, in compile_inner
    return _compile_inner(code, one_graph, hooks, transform)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_utils_internal.py", line 87, in wrapper_function
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 699, in _compile_inner
    out_code = transform_code_object(code, transform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1322, in transform_code_object
    transformations(instructions, code_options)
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 219, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 634, in transform
    tracer.run()
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2796, in run
    super().run()
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 983, in run
    while self.step():
          ^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 895, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2987, in RETURN_VALUE
    self._return(inst)
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2972, in _return
    self.output.compile_subgraph(
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1142, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1369, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1416, in call_user_compiler
    return self._call_user_compiler(gm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1465, in _call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine
    raise e
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
    return cls(ipc_path=ipc_path,
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__
    self.engine = LLMEngine(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 276, in __init__
    self._initialize_kv_caches()
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 416, in _initialize_kv_caches
    self.model_executor.determine_num_available_blocks())
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 39, in determine_num_available_blocks
    num_blocks = self._run_workers("determine_num_available_blocks", )
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 157, in _run_workers
    driver_worker_output = driver_worker_method(*args, **kwargs)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/worker/worker.py", line 202, in determine_num_available_blocks
    self.model_runner.profile_run()
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1331, in profile_run
    self.execute_model(model_input, kv_caches, intermediate_tensors)
  File "/home/nonroot/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/nonroot/.local/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 152, in _wrapper
    raise type(err)(
          ^^^^^^^^^^
TypeError: BackendCompilerFailed.__init__() missing 1 required positional argument: 'inner_exception'
[rank0]:[W108 20:56:17.821587941 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())
DEBUG 01-08 20:56:25 client.py:252] Shutting down MQLLMEngineClient output handler.

output.log

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@jgen1 jgen1 added the bug Something isn't working label Jan 8, 2025
@coderchem
Copy link

v6.6 也有

@EdwardChan5000
Copy link

same here

@hediyuan
Copy link

same here: vllm==0.6.6.post1
i also meet error: "/tmp/tmp9zjj43yb/main.c:6:23: fatal error: stdatomic.h: No such file or directory"
does vllm require a hign version of gcc? I haven't encountered this issue before

@jgen1
Copy link
Author

jgen1 commented Jan 17, 2025

Note that my OS is wolfi

@ywang96
Copy link
Member

ywang96 commented Jan 18, 2025

@youkaichao do you think this has anything to do with torch compile?

@youkaichao
Copy link
Member

yes, we moved away from torch.jit.script to torch.compile .

File "/home/nonroot/.local/lib/python3.11/site-packages/triton/runtime/build.py", line 32, in _build
raise RuntimeError("Failed to find C compiler. Please specify via CC environment variable.")
RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

@jgen1 it seems triton is broken because it does not find a C compiler.

@jgen1
Copy link
Author

jgen1 commented Jan 21, 2025

@youkaichao But with the exact same setup, vllm 0.6.4 works - it does detect my C compiler.

@youkaichao
Copy link
Member

@jgen1 the PR #10406 is only released after 0.6.5 .

Can you try to specify your C compiler following the error message?

@youkaichao
Copy link
Member

I think most environments do have a C compiler.

@jgen1
Copy link
Author

jgen1 commented Jan 22, 2025

I will try that @youkaichao, but wasn't a C compiler also required in 0.6.4?

@hewr2010
Copy link

pytorch/pytorch#136294

@jgen1 I think this is related. Maybe you can try torch>=2.5.1

@youkaichao
Copy link
Member

I will try that @youkaichao, but wasn't a C compiler also required in 0.6.4?

@jgen1 previously we use torch.jit.script , that does not need a C compiler. Now we switched to torch.compile, it is more performant, but requires a C compiler.

Please let me know if installing a C compiler and specify it via env var CC helps, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants