Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QWEN2.5 inference issue on vLLM 0.6.2 #12669

Open
kunger97 opened this issue Jan 7, 2025 · 3 comments
Open

QWEN2.5 inference issue on vLLM 0.6.2 #12669

kunger97 opened this issue Jan 7, 2025 · 3 comments
Assignees

Comments

@kunger97
Copy link

kunger97 commented Jan 7, 2025

I install 0.6.2 branche on https://github.com/analytics-zoo/vllm
run a QWEN2.5 based Fine-tuning model use this command

python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server --served-model-name QWEN2_5 --model Model_Path --device xpu --dtype float16 --enforce-eager --max-model-len 8192 --load-in-low-bit fp8

Throws the following error

2025-01-07 13:28:06,477 - INFO - Converting the current model to fp8_e5m2 format......
2025-01-07 13:28:06,479 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations
2025-01-07 13:28:24,947 - INFO - Only HuggingFace Transformers models are currently supported for further optimizations
2025-01-07 13:28:30,761 - INFO - Loading model weights took 15.4058 GB
Process SpawnProcess-21:
Traceback (most recent call last):
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/engine/engine.py", line 145, in run_mp_engine
    engine = IPEXLLMMQLLMEngine.from_engine_args(engine_args=engine_args,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/engine/engine.py", line 133, in from_engine_args
    return super().from_engine_args(engine_args, usage_context, ipc_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/engine/multiprocessing/engine.py", line 138, in from_engine_args
    return cls(
           ^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/engine/multiprocessing/engine.py", line 78, in __init__
    self.engine = LLMEngine(*args,
                  ^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/engine/llm_engine.py", line 339, in __init__
    self._initialize_kv_caches()
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/engine/llm_engine.py", line 474, in _initialize_kv_caches
    self.model_executor.determine_num_available_blocks())
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/executor/gpu_executor.py", line 114, in determine_num_available_blocks
    return self.driver_worker.determine_num_available_blocks()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_worker.py", line 128, in determine_num_available_blocks
    self.model_runner.profile_run()
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_model_runner.py", line 538, in profile_run
    self.execute_model(model_input, kv_caches, intermediate_tensors)
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/worker/xpu_model_runner.py", line 643, in execute_model
    hidden_or_intermediate_states = model_executable(
                                    ^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/qwen2.py", line 369, in forward
    hidden_states = self.model(input_ids, positions, kv_caches,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/qwen2.py", line 285, in forward
    hidden_states, residual = layer(
                              ^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/qwen2.py", line 210, in forward
    hidden_states = self.self_attn(
                    ^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/model_executor/models/qwen2.py", line 157, in forward
    attn_output = self.attn(q, k, v, kv_cache, attn_metadata)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/attention/layer.py", line 98, in forward
    return self.impl.forward(query,
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/attention/backends/ipex_attn.py", line 340, in forward
    sub_out = xe_addons.sdp_causal(
              ^^^^^^^^^^^^^^^^^^^^^
TypeError: sdp_causal(): incompatible function arguments. The following argument types are supported:
    1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: Optional[torch.Tensor], arg4: float) -> torch.Tensor

Invoked with: tensor([[[[-3.2773, -1.8691,  2.1016,  ..., -0.1118, -0.0577, -0.0062],
          [ 1.2305,  0.7646,  2.6621,  ..., -0.1118, -0.0577, -0.0062],
          [ 4.6094,  2.9277,  2.1426,  ..., -0.1118, -0.0577, -0.0062],
          ...,
          [ 0.0844, -2.4434,  2.0723,  ..., -0.1119, -0.0577, -0.0062],
          [-4.0312, -3.4102,  2.6621,  ..., -0.1119, -0.0577, -0.0062],
          [-4.4375, -2.2793,  2.1680,  ..., -0.1119, -0.0577, -0.0062]],

         [[ 0.1819,  0.0770,  0.0330,  ...,  1.1670,  0.7852,  1.4102],
          [-0.1307,  0.0313,  0.0271,  ...,  1.1670,  0.7852,  1.4102],
          [-0.3232, -0.0336,  0.0102,  ...,  1.1670,  0.7852,  1.4102],
          ...,
          [ 0.0446,  0.0152,  0.0329,  ...,  1.1670,  0.7852,  1.4102],
          [ 0.2969,  0.0693,  0.0274,  ...,  1.1670,  0.7852,  1.4102],
          [ 0.2764,  0.0807,  0.0107,  ...,  1.1670,  0.7852,  1.4102]],

         [[-3.6953, -1.9141, -1.3545,  ...,  0.1860,  0.9736, -0.1448],
          [-2.0488, -2.2266, -0.5103,  ...,  0.1860,  0.9736, -0.1448],
          [ 1.4834, -1.1709,  0.5415,  ...,  0.1860,  0.9736, -0.1448],
          ...,
          [ 2.8047,  1.5928, -1.3711,  ...,  0.1860,  0.9736, -0.1448],
          [-0.5103, -0.0794, -0.5376,  ...,  0.1860,  0.9736, -0.1448],
          [-3.3555, -1.7041,  0.5146,  ...,  0.1860,  0.9736, -0.1448]],

         ...,

         [[-0.0356, -0.1814, -0.2642,  ...,  0.1151, -0.1958,  0.0391],
          [ 0.3279, -0.3879, -0.5889,  ...,  0.1151, -0.1958,  0.0391],
          [ 0.3901, -0.3564, -0.6738,  ...,  0.1151, -0.1958,  0.0391],
          ...,
          [-0.2471,  0.3918, -0.2532,  ...,  0.1151, -0.1958,  0.0391],
          [-0.4131,  0.1932, -0.5830,  ...,  0.1151, -0.1958,  0.0391],
          [-0.1993, -0.1244, -0.6753,  ...,  0.1151, -0.1958,  0.0391]],

         [[ 0.4756, -0.2690, -0.3613,  ..., -0.5610, -0.1216,  0.2231],
          [-1.2764,  1.2744,  0.6343,  ..., -0.5610, -0.1216,  0.2231],
          [-1.8555,  2.0352,  1.3730,  ..., -0.5610, -0.1216,  0.2231],
          ...,
          [ 0.8530, -1.9365, -0.3879,  ..., -0.5610, -0.1216,  0.2231],
          [ 1.8730, -1.8115,  0.6099,  ..., -0.5610, -0.1216,  0.2231],
          [ 1.1709, -0.5718,  1.3584,  ..., -0.5610, -0.1216,  0.2231]],

         [[ 0.2749, -0.4829, -0.2566,  ...,  0.4026, -2.5898, -1.5234],
          [ 0.3611, -0.2015, -0.1865,  ...,  0.4026, -2.5898, -1.5234],
          [ 0.1151,  0.2037, -0.0405,  ...,  0.4026, -2.5898, -1.5234],
          ...,
          [-0.3730, -0.0883, -0.2571,  ...,  0.4023, -2.5898, -1.5234],
          [-0.2070, -0.4285, -0.1897,  ...,  0.4023, -2.5898, -1.5234],
          [ 0.1494, -0.5054, -0.0449,  ...,  0.4023, -2.5898, -1.5234]]]],
       device='xpu:0', dtype=torch.float16), tensor([[[[-0.1953, -1.5000,  0.3149,  ...,  0.4949,  0.5200,  1.7520],
          [-0.5244, -0.7461,  0.2632,  ...,  0.4949,  0.5200,  1.7520],
          [-0.3716,  0.4666,  0.1044,  ...,  0.4949,  0.5200,  1.7520],
          ...,
          [ 0.4766, -0.1113,  0.3147,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.4619, -1.1953,  0.2664,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.0227, -1.5449,  0.1096,  ...,  0.4946,  0.5200,  1.7520]],

         [[-0.1953, -1.5000,  0.3149,  ...,  0.4949,  0.5200,  1.7520],
          [-0.5244, -0.7461,  0.2632,  ...,  0.4949,  0.5200,  1.7520],
          [-0.3716,  0.4666,  0.1044,  ...,  0.4949,  0.5200,  1.7520],
          ...,
          [ 0.4766, -0.1113,  0.3147,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.4619, -1.1953,  0.2664,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.0227, -1.5449,  0.1096,  ...,  0.4946,  0.5200,  1.7520]],

         [[-0.1953, -1.5000,  0.3149,  ...,  0.4949,  0.5200,  1.7520],
          [-0.5244, -0.7461,  0.2632,  ...,  0.4949,  0.5200,  1.7520],
          [-0.3716,  0.4666,  0.1044,  ...,  0.4949,  0.5200,  1.7520],
          ...,
          [ 0.4766, -0.1113,  0.3147,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.4619, -1.1953,  0.2664,  ...,  0.4946,  0.5200,  1.7520],
          [ 0.0227, -1.5449,  0.1096,  ...,  0.4946,  0.5200,  1.7520]],

         ...,

         [[ 1.1270, -0.2896, -1.4219,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.6606,  0.3298, -0.2344,  ..., -1.1260,  0.5679, -3.2266],
          [-0.4136,  0.7466,  1.0488,  ..., -1.1260,  0.5679, -3.2266],
          ...,
          [-0.8833, -0.6665, -1.4482,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.1136, -0.7681, -0.2703,  ..., -1.1260,  0.5679, -3.2266],
          [ 1.0059, -0.3972,  1.0176,  ..., -1.1260,  0.5679, -3.2266]],

         [[ 1.1270, -0.2896, -1.4219,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.6606,  0.3298, -0.2344,  ..., -1.1260,  0.5679, -3.2266],
          [-0.4136,  0.7466,  1.0488,  ..., -1.1260,  0.5679, -3.2266],
          ...,
          [-0.8833, -0.6665, -1.4482,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.1136, -0.7681, -0.2703,  ..., -1.1260,  0.5679, -3.2266],
          [ 1.0059, -0.3972,  1.0176,  ..., -1.1260,  0.5679, -3.2266]],

         [[ 1.1270, -0.2896, -1.4219,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.6606,  0.3298, -0.2344,  ..., -1.1260,  0.5679, -3.2266],
          [-0.4136,  0.7466,  1.0488,  ..., -1.1260,  0.5679, -3.2266],
          ...,
          [-0.8833, -0.6665, -1.4482,  ..., -1.1260,  0.5679, -3.2266],
          [ 0.1136, -0.7681, -0.2703,  ..., -1.1260,  0.5679, -3.2266],
          [ 1.0059, -0.3972,  1.0176,  ..., -1.1260,  0.5679, -3.2266]]]],
       device='xpu:0', dtype=torch.float16), tensor([[[[-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          ...,
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855]],

         [[-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          ...,
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855]],

         [[-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          ...,
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855],
          [-0.0333,  0.1555, -0.0426,  ...,  0.0063, -0.0494,  0.0855]],

         ...,

         [[-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          ...,
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598]],

         [[-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          ...,
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598]],

         [[-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          ...,
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598],
          [-0.0724, -0.0535, -0.0873,  ..., -0.0245, -0.0409,  0.0598]]]],
       device='xpu:0', dtype=torch.float16), None
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 574, in <module>
    uvloop.run(run_server(args))
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run
    return runner.run(wrapper())
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
           ^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 541, in run_server
    async with build_async_engine_client(args) as engine_client:
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 105, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/ipex_llm/vllm/xpu/entrypoints/openai/api_server.py", line 195, in build_async_engine_client_from_engine_args
    raise RuntimeError(
RuntimeError: Engine process failed to start
@gc-fu
Copy link
Contributor

gc-fu commented Jan 8, 2025

Hi, Can you please post the result of the following commands:

pip list | grep bigdl

and

cd /llm/vllm
git log

@kunger97
Copy link
Author

kunger97 commented Jan 8, 2025

bigdl-core-xe-21                  2.6.0b20250106
bigdl-core-xe-addons-21           2.6.0b20250106
bigdl-core-xe-batch-21            2.6.0b20250106
commit 8fb3efa86344ca90d014dbf17ffc8e810b766e15 (HEAD -> 0.6.2, origin/0.6.2)
Author: Wang, Jian4 <[email protected]>
Date:   Fri Jan 3 15:15:17 2025 +0800

    update internvl2(#72)

commit 6099ef5d4a1acf40cd9e220f66783214743df3c3
Author: Wang, Jian4 <[email protected]>
Date:   Thu Jan 2 10:15:10 2025 +0800

    Enable gemma model (#68)
    
    * enable gemma model
    
    * update
    
    * update
    
    * update not use gqa
    
    * update
    
    * update for gemma-27b not use

commit 212e85c5bbce182b08f49fe0a0e07c5d4dae5b85
Author: Wang, Jian4 <[email protected]>
Date:   Thu Jan 2 09:43:32 2025 +0800

    Update sdp causal (#71)
    
    * update for new func
    
    * update for 2.5.0 error
    
    * update

commit 947415724adf828d35b170df03b1eb972689a374
Author: Xiangyu Tian <[email protected]>
Date:   Tue Dec 24 16:59:14 2024 +0800

    Refine evictor for prefix caching(#70)

commit 36e13897285fd6485b81e2fd242ebf49e7d96439
Author: Xiangyu Tian <[email protected]>
Date:   Tue Dec 24 15:25:43 2024 +0800

    Update prefix benchmark (#69)

commit 328537a61ce6c3497457f68bb0ae83d1bf5a0c6f
Author: Wang, Jian4 <[email protected]>
Date:   Mon Dec 16 11:00:03 2024 +0800

    add_sliding_windows (#67)
    
    * add_sliding_windows
    
:

@gc-fu
Copy link
Contributor

gc-fu commented Jan 8, 2025

The vLLM installed seems not the latest version. Can you try to reinstall the vLLM?

The error here "/home/ua55abec29206204c6df28b1eb1b8906/.conda/envs/ipex_pip/lib/python3.11/site-packages/vllm-0.6.2+xpu-py3.11-linux-x86_64.egg/vllm/attention/backends/ipex_attn.py", line 340, in forward indicates that the code is not latest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants