Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] 使用vllm加速时无法启动多卡 #1810

Open
2 tasks done
Lichunyan3 opened this issue Jan 8, 2025 · 0 comments
Open
2 tasks done

[Bug] 使用vllm加速时无法启动多卡 #1810

Lichunyan3 opened this issue Jan 8, 2025 · 0 comments

Comments

@Lichunyan3
Copy link

Lichunyan3 commented Jan 8, 2025

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0',
'GPU 0,1,2,3,4,5,6,7': 'NVIDIA A800-SXM4-80GB',
'MMEngine': '0.10.5',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 12.2, V12.2.140',
'OpenCV': '4.10.0',
'PyTorch': '2.5.1+cu124',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2024.2-Product Build 20240605 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.5.3 (Git Hash '
'66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 12.4\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 90.1\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.4, '
'CUDNN_VERSION=9.1.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON '
'-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wsuggest-override '
'-Wno-psabi -Wno-error=old-style-cast '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, '
'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, '
'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, '
'USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, '
'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n',
'Python': '3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]',
'TorchVision': '0.20.1+cu124',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.3.9+b26c76a',
'sys.platform': 'linux',
'transformers': '4.46.1'}

Reproduces the problem - code/configuration sample

【模型配置】使用两张卡
1 from opencompass.models import HuggingFacewithChatTemplate
2
3 models = [
4 dict(
5 type=HuggingFacewithChatTemplate,
6 abbr='qwen2.5-1.5b-instruct-hf',
7 #path='Qwen/Qwen2.5-1.5B-Instruct',
8 path='/workspace/models/hf_models/Qwen2.5-1.5B-Instruct',
9 max_out_len=4096,
10 batch_size=8,
11 run_cfg=dict(num_gpus=2),
12 )
13 ]

Reproduces the problem - command or script

python run.py --datasets ceval_gen --models hf_qwen2_5_1_5b_instruct --debug -a vllm

Reproduces the problem - error message

01/08 11:34:07 - OpenCompass - INFO - Loading hf_qwen2_5_1_5b_instruct: configs/models/hf_qwen2_5_1_5b_instruct.py
01/08 11:34:07 - OpenCompass - INFO - Transforming qwen2.5-1.5b-instruct-hf to vllm
01/08 11:34:07 - OpenCompass - INFO - Loading example: /usr/local/lib/python3.10/dist-packages/opencompass/configs/./summarizers/example.py
01/08 11:34:07 - OpenCompass - INFO - Current exp folder: outputs/default/20250108_113407
01/08 11:34:07 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
01/08 11:34:07 - OpenCompass - INFO - Partitioned into 2 tasks.
01/08 11:34:11 - OpenCompass - WARNING - Only use 2 GPUs for total 8 available GPUs in debug mode.
01/08 11:34:11 - OpenCompass - INFO - Task [qwen2.5-1.5b-instruct-vllm/ceval-computer_network_0,qwen2.5-1.5b-instruct-vllm/ceval-operating_system_0,qwen2.5-1.5b-instruct-vllm/ceval-computer_architecture_0,qwen2.5-1.5b-instruct-vllm/ceval-college_programming_0,qwen2.5-1.5b-instruct-vllm/ceval-college_physics_0,qwen2.5-1.5b-instruct-vllm/ceval-college_chemistry_0,qwen2.5-1.5b-instruct-vllm/ceval-advanced_mathematics_0,qwen2.5-1.5b-instruct-vllm/ceval-probability_and_statistics_0,qwen2.5-1.5b-instruct-vllm/ceval-discrete_mathematics,qwen2.5-1.5b-instruct-vllm/ceval-electrical_engineer_1,qwen2.5-1.5b-instruct-vllm/ceval-metrology_engineer_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_mathematics_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_physics_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_chemistry_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_biology_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_mathematics_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_biology_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_physics_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_chemistry_1,qwen2.5-1.5b-instruct-vllm/ceval-veterinary_medicine_1,qwen2.5-1.5b-instruct-vllm/ceval-college_economics_1,qwen2.5-1.5b-instruct-vllm/ceval-business_administration_1,qwen2.5-1.5b-instruct-vllm/ceval-marxism_1,qwen2.5-1.5b-instruct-vllm/ceval-mao_zedong_thought_1,qwen2.5-1.5b-instruct-vllm/ceval-education_science_1,qwen2.5-1.5b-instruct-vllm/ceval-teacher_qualification_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_politics_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_geography_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_politics_1,qwen2.5-1.5b-instruct-vllm/ceval-modern_chinese_history_0,qwen2.5-1.5b-instruct-vllm/ceval-ideological_and_moral_cultivation_0,qwen2.5-1.5b-instruct-vllm/ceval-logic_0,qwen2.5-1.5b-instruct-vllm/ceval-law_0,qwen2.5-1.5b-instruct-vllm/ceval-chinese_language_and_literature_0,qwen2.5-1.5b-instruct-vllm/ceval-art_studies_0,qwen2.5-1.5b-instruct-vllm/ceval-professional_tour_guide_0,qwen2.5-1.5b-instruct-vllm/ceval-legal_professional_0,qwen2.5-1.5b-instruct-vllm/ceval-high_school_chinese_0,qwen2.5-1.5b-instruct-vllm/ceval-high_school_history_0,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_history_0,qwen2.5-1.5b-instruct-vllm/ceval-civil_servant_0,qwen2.5-1.5b-instruct-vllm/ceval-sports_science_0,qwen2.5-1.5b-instruct-vllm/ceval-plant_protection_0,qwen2.5-1.5b-instruct-vllm/ceval-basic_medicine_0,qwen2.5-1.5b-instruct-vllm/ceval-clinical_medicine_0,qwen2.5-1.5b-instruct-vllm/ceval-urban_and_rural_planner_0,qwen2.5-1.5b-instruct-vllm/ceval-accountant_0,qwen2.5-1.5b-instruct-vllm/ceval-fire_engineer_0,qwen2.5-1.5b-instruct-vllm/ceval-environmental_impact_assessment_engineer_0,qwen2.5-1.5b-instruct-vllm/ceval-tax_accountant_0,qwen2.5-1.5b-instruct-vllm/ceval-physician_0]
INFO 01-08 11:34:15 config.py:378] This model supports multiple tasks: {'embedding', 'generate'}. Defaulting to 'generate'.
INFO 01-08 11:34:15 config.py:1048] Defaulting to use mp for distributed inference
INFO 01-08 11:34:15 llm_engine.py:249] Initializing an LLM engine (vv1.3) with config: model='/workspace/models/hf_models/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='/workspace/models/hf_models/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/workspace/models/hf_models/Qwen2.5-1.5B-Instruct, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, chat_template_text_format=string, mm_processor_kwargs=None, pooler_config=None)
INFO 01-08 11:34:16 custom_cache_manager.py:17] Setting Triton cache manager to: coca_vllm.triton_utils.custom_cache_manager:CustomCacheManager
INFO 01-08 11:34:16 selector.py:135] Using Flash Attention backend.
(VllmWorkerProcess pid=69399) INFO 01-08 11:34:16 selector.py:135] Using Flash Attention backend.
(VllmWorkerProcess pid=69399) INFO 01-08 11:34:16 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] Exception in worker VllmWorkerProcess while processing method init_device.
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/coca_vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/coca_vllm/worker/worker.py", line 135, in init_device
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] torch.cuda.set_device(self.device)
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 478, in set_device
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] torch._C._cuda_setDevice(device)
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 305, in _lazy_init
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] raise RuntimeError(
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
^C(VllmWorkerProcess pid=69399) INFO 01-08 11:34:30 multiproc_worker_utils.py:240] Worker exiting

Other information

不使用vllm时可以多卡评测,使用vllm时报错:RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant