QWEN2.5 inference issue on vLLM 0.6.2 #12669

kunger97 · 2025-01-07T13:32:36Z

I install 0.6.2 branche on https://github.com/analytics-zoo/vllm
run a QWEN2.5 based Fine-tuning model use this command

python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server --served-model-name QWEN2_5 --model Model_Path --device xpu --dtype float16 --enforce-eager --max-model-len 8192 --load-in-low-bit fp8

Throws the following error

2025-01-07 13:28:06,477 - INFO - Converting the current model to fp8_e5m2 format......
2025-01-07 13:28:06,479 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations
2025-01-07 13:28:24,947 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations
2025-01-07 13:28:30,761 - INFO - Loading model weights took 15.4058 GB
Process SpawnProcess-21:
Traceback (most recent call last):
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/engine/engine.py", line 145, in run_mp_engine
    engine = IPEXLLMMQLLMEngine.from_engine_args(engine_args=engine_args,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/engine/engine.py", line 133, in from_engine_args
    return super().from_engine_args(engine_args, usage_context, ipc_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/engine/multiprocessing/engine.py", line 138, in from_engine_args
    return cls(
           ^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/engine/multiprocessing/engine.py", line 78, in __init__
    self.engine = LLMEngine(*args,
                  ^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/engine/llm_engine.py", line 339, in __init__
    self._initialize_kv_caches()
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/engine/llm_engine.py", line 474, in _initialize_kv_caches
    self.model_executor.determine_num_available_blocks())
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/executor/gpu_executor.py", line 114, in determine_num_available_blocks
    return self.driver_worker.determine_num_available_blocks()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 128, in determine_num_available_blocks
    self.model_runner.profile_run()
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_model_runner.py", line 538, in profile_run
    self.execute_model(model_input, kv_caches, intermediate_tensors)
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_model_runner.py", line 643, in execute_model
    hidden_or_intermediate_states = model_executable(
                                    ^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/qwen2.py", line 369, in forward
    hidden_states = self.model(input_ids, positions, kv_caches,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/qwen2.py", line 285, in forward
    hidden_states, residual = layer(
                              ^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/qwen2.py", line 210, in forward
    hidden_states = self.self_attn(
                    ^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/qwen2.py", line 157, in forward
    attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/attention/layer.py", line 98, in forward
    return self.impl.forward(query,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/attention/backends/ipex_attn.py", line 340, in forward
    sub_out = xe_addons.sdp_causal(
              ^^^^^^^^^^^^^^^^^^^^^
TypeError: sdp_causal(): incompatible function arguments. The following argument types are supported:
    1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: Optional[torch.Tensor], arg4: float) -> torch.Tensor

Invoked with: tensor([[[[-3.2773, -1.8691,  2.1016,  ..., -0.1118, -0.0577, -0.0062],
          [ 1.2305,  0.7646,  2.6621,  ..., -0.1118, -0.0577, -0.0062],
          [ 4.6094,  2.9277,  2.1426,  ..., -0.1118, -0.0577, -0.0062],
          ...,
          [ 0.0844, -2.4434,  2.0723,  ..., -0.1119, -0.0577, -0.0062],
          [-4.0312, -3.4102,  2.6621,  ..., -0.1119, -0.0577, -0.0062],
          [-4.4375, -2.2793,  2.1680,  ..., -0.1119, -0.0577, -0.0062]],

         [[ 0.1819,  0.0770,  0.0330,  ...,  1.1670,  0.7852,  1.4102],
          [-0.1307,  0.0313,  0.0271,  ...,  1.1670,  0.7852,  1.4102],
          [-0.3232, -0.0336,  0.0102,  ...,  1.1670,  0.7852,  1.4102],
          ...,
          [ 0.0446,  0.0152,  0.0329,  ...,  1.1670,  0.7852,  1.4102],
          [ 0.2969,  0.0693,  0.0274,  ...,  1.1670,  0.7852,  1.4102],
          [ 0.2764,  0.0807,  0.0107,  ...,  1.1670,  0.7852,  1.4102]],

         [[-3.6953, -1.9141, -1.3545,  ...,  0.1860,  0.9736, -0.1448],
          [-2.0488, -2.2266, -0.5103,  ...,  0.1860,  0.9736, -0.1448],
          [ 1.4834, -1.1709,  0.5415,  ...,  0.1860,  0.9736, -0.1448],
          ...,
          [ 2.8047,  1.5928, -1.3711,  ...,  0.1860,  0.9736, -0.1448],
          [-0.5103, -0.0794, -0.5376,  ...,  0.1860,  0.9736, -0.1448],
          [-3.3555, -1.7041,  0.5146,  ...,  0.1860,  0.9736, -0.1448]],

         ...,

         [[-0.0356, -0.1814, -0.2642,  ...,  0.1151, -0.1958,  0.0391],
          [ 0.3279, -0.3879, -0.5889,  ...,  0.1151, -0.1958,  0.0391],
          [ 0.3901, -0.3564, -0.6738,  ...,  0.1151, -0.1958,  0.0391],
          ...,
          [-0.2471,  0.3918, -0.2532,  ...,  0.1151, -0.1958,  0.0391],
          [-0.4131,  0.1932, -0.5830,  ...,  0.1151, -0.1958,  0.0391],
          [-0.1993, -0.1244, -0.6753,  ...,  0.1151, -0.1958,  0.0391]],

         [[ 0.4756, -0.2690, -0.3613,  ..., -0.5610, -0.1216,  0.2231],
          [-1.2764,  1.2744,  0.6343,  ..., -0.5610, -0.1216,  0.2231],
          [-1.8555,  2.0352,  1.3730,  ..., -0.5610, -0.1216,  0.2231],
          ...,
          [ 0.8530, -1.9365, -0.3879,  ..., -0.5610, -0.1216,  0.2231],
          [ 1.8730, -1.8115,  0.6099,  ..., -0.5610, -0.1216,  0.2231],
          [ 1.1709, -0.5718,  1.3584,  ..., -0.5610, -0.1216,  0.2231]],

         [[ 0.2749, -0.4829, -0.2566,  ...,  0.4026, -2.5898, -1.5234],
          [ 0.3611, -0.2015, -0.1865,  ...,  0.4026, -2.5898, -1.5234],
          [ 0.1151,  0.2037, -0.0405,  ...,  0.4026, -2.5898, -1.5234],
          ...,
          [-0.3730, -0.0883, -0.2571,  ...,  0.4023, -2.5898, -1.5234],
          [-0.2070, -0.4285, -0.1897,  ...,  0.4023, -2.5898, -1.5234],
          [ 0.1494, -0.5054, -0.0449,  ...,  0.4023, -2.5898, -1.5234]]]],
       device='xpu:0', dtype=torch.float16), tensor([[[[-0.1953, -1.5000,  0.3149,  ...,  0.4949,  0.5200,  1.7520],
          [-0.5244, -0.7461,  0.2632,  ...,  0.4949,  0.5200,  1.7520],
          [-0.3716,  0.4666,  0.1044,  ...,  0.4949,  0.5200,  1.7520],
          ...,
          [ 0.4766, -0.1113,  0.3147,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.4619, -1.1953,  0.2664,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.0227, -1.5449,  0.1096,  ...,  0.4946,  0.5200,  1.7520]],

         [[-0.1953, -1.5000,  0.3149,  ...,  0.4949,  0.5200,  1.7520],
          [-0.5244, -0.7461,  0.2632,  ...,  0.4949,  0.5200,  1.7520],
          [-0.3716,  0.4666,  0.1044,  ...,  0.4949,  0.5200,  1.7520],
          ...,
          [ 0.4766, -0.1113,  0.3147,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.4619, -1.1953,  0.2664,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.0227, -1.5449,  0.1096,  ...,  0.4946,  0.5200,  1.7520]],

         [[-0.1953, -1.5000,  0.3149,  ...,  0.4949,  0.5200,  1.7520],
          [-0.5244, -0.7461,  0.2632,  ...,  0.4949,  0.5200,  1.7520],
          [-0.3716,  0.4666,  0.1044,  ...,  0.4949,  0.5200,  1.7520],
          ...,
          [ 0.4766, -0.1113,  0.3147,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.4619, -1.1953,  0.2664,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.0227, -1.5449,  0.1096,  ...,  0.4946,  0.5200,  1.7520]],

         ...,

         [[ 1.1270, -0.2896, -1.4219,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.6606,  0.3298, -0.2344,  ..., -1.1260,  0.5679, -3.2266],
          [-0.4136,  0.7466,  1.0488,  ..., -1.1260,  0.5679, -3.2266],
          ...,
          [-0.8833, -0.6665, -1.4482,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.1136, -0.7681, -0.2703,  ..., -1.1260,  0.5679, -3.2266],
          [ 1.0059, -0.3972,  1.0176,  ..., -1.1260,  0.5679, -3.2266]],

         [[ 1.1270, -0.2896, -1.4219,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.6606,  0.3298, -0.2344,  ..., -1.1260,  0.5679, -3.2266],
          [-0.4136,  0.7466,  1.0488,  ..., -1.1260,  0.5679, -3.2266],
          ...,
          [-0.8833, -0.6665, -1.4482,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.1136, -0.7681, -0.2703,  ..., -1.1260,  0.5679, -3.2266],
          [ 1.0059, -0.3972,  1.0176,  ..., -1.1260,  0.5679, -3.2266]],

         [[ 1.1270, -0.2896, -1.4219,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.6606,  0.3298, -0.2344,  ..., -1.1260,  0.5679, -3.2266],
          [-0.4136,  0.7466,  1.0488,  ..., -1.1260,  0.5679, -3.2266],
          ...,
          [-0.8833, -0.6665, -1.4482,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.1136, -0.7681, -0.2703,  ..., -1.1260,  0.5679, -3.2266],
          [ 1.0059, -0.3972,  1.0176,  ..., -1.1260,  0.5679, -3.2266]]]],
       device='xpu:0', dtype=torch.float16), tensor([[[[-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          ...,
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855]],

         [[-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          ...,
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855]],

         [[-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          ...,
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855]],

         ...,

         [[-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          ...,
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598]],

         [[-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          ...,
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598]],

         [[-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          ...,
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598]]]],
       device='xpu:0', dtype=torch.float16), None
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 574, in <module>
    uvloop.run(run_server(args))
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run
    return runner.run(wrapper())
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
           ^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 541, in run_server
    async with build_async_engine_client(args) as engine_client:
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 105, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 195, in build_async_engine_client_from_engine_args
    raise RuntimeError(
RuntimeError: Engine process failed to start

The text was updated successfully, but these errors were encountered:

gc-fu · 2025-01-08T02:06:23Z

Hi, Can you please post the result of the following commands:

pip list | grep bigdl

and

cd /llm/vllm
git log

kunger97 · 2025-01-08T02:29:26Z

bigdl-core-xe-21                  2.6.0b20250106
bigdl-core-xe-addons-21           2.6.0b20250106
bigdl-core-xe-batch-21            2.6.0b20250106

commit 8fb3efa86344ca90d014dbf17ffc8e810b766e15 (HEAD -> 0.6.2, origin/0.6.2)
Author: Wang, Jian4 <[email protected]>
Date:   Fri Jan 3 15:15:17 2025 +0800

    update internvl2(#72)

commit 6099ef5d4a1acf40cd9e220f66783214743df3c3
Author: Wang, Jian4 <[email protected]>
Date:   Thu Jan 2 10:15:10 2025 +0800

    Enable gemma model (#68)
    
    * enable gemma model
    
    * update
    
    * update
    
    * update not use gqa
    
    * update
    
    * update for gemma-27b not use

commit 212e85c5bbce182b08f49fe0a0e07c5d4dae5b85
Author: Wang, Jian4 <[email protected]>
Date:   Thu Jan 2 09:43:32 2025 +0800

    Update sdp causal (#71)
    
    * update for new func
    
    * update for 2.5.0 error
    
    * update

commit 947415724adf828d35b170df03b1eb972689a374
Author: Xiangyu Tian <[email protected]>
Date:   Tue Dec 24 16:59:14 2024 +0800

    Refine evictor for prefix caching(#70)

commit 36e13897285fd6485b81e2fd242ebf49e7d96439
Author: Xiangyu Tian <[email protected]>
Date:   Tue Dec 24 15:25:43 2024 +0800

    Update prefix benchmark (#69)

commit 328537a61ce6c3497457f68bb0ae83d1bf5a0c6f
Author: Wang, Jian4 <[email protected]>
Date:   Mon Dec 16 11:00:03 2024 +0800

    add_sliding_windows (#67)
    
    * add_sliding_windows
    
:

gc-fu · 2025-01-08T05:40:49Z

The vLLM installed seems not the latest version. Can you try to reinstall the vLLM?

The error here "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/attention/backends/ipex_attn.py", line 340, in forward indicates that the code is not latest.

glorysdj assigned gc-fu Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QWEN2.5 inference issue on vLLM 0.6.2 #12669

QWEN2.5 inference issue on vLLM 0.6.2 #12669

kunger97 commented Jan 7, 2025

gc-fu commented Jan 8, 2025

kunger97 commented Jan 8, 2025

gc-fu commented Jan 8, 2025

QWEN2.5 inference issue on vLLM 0.6.2 #12669

QWEN2.5 inference issue on vLLM 0.6.2 #12669

Comments

kunger97 commented Jan 7, 2025

gc-fu commented Jan 8, 2025

kunger97 commented Jan 8, 2025

gc-fu commented Jan 8, 2025