You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
python run.py --datasets ceval_gen --models hf_qwen2_5_1_5b_instruct --debug -a vllm
Reproduces the problem - error message
01/08 11:34:07 - OpenCompass - INFO - Loading hf_qwen2_5_1_5b_instruct: configs/models/hf_qwen2_5_1_5b_instruct.py
01/08 11:34:07 - OpenCompass - INFO - Transforming qwen2.5-1.5b-instruct-hf to vllm
01/08 11:34:07 - OpenCompass - INFO - Loading example: /usr/local/lib/python3.10/dist-packages/opencompass/configs/./summarizers/example.py
01/08 11:34:07 - OpenCompass - INFO - Current exp folder: outputs/default/20250108_113407
01/08 11:34:07 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
01/08 11:34:07 - OpenCompass - INFO - Partitioned into 2 tasks.
01/08 11:34:11 - OpenCompass - WARNING - Only use 2 GPUs for total 8 available GPUs in debug mode.
01/08 11:34:11 - OpenCompass - INFO - Task [qwen2.5-1.5b-instruct-vllm/ceval-computer_network_0,qwen2.5-1.5b-instruct-vllm/ceval-operating_system_0,qwen2.5-1.5b-instruct-vllm/ceval-computer_architecture_0,qwen2.5-1.5b-instruct-vllm/ceval-college_programming_0,qwen2.5-1.5b-instruct-vllm/ceval-college_physics_0,qwen2.5-1.5b-instruct-vllm/ceval-college_chemistry_0,qwen2.5-1.5b-instruct-vllm/ceval-advanced_mathematics_0,qwen2.5-1.5b-instruct-vllm/ceval-probability_and_statistics_0,qwen2.5-1.5b-instruct-vllm/ceval-discrete_mathematics,qwen2.5-1.5b-instruct-vllm/ceval-electrical_engineer_1,qwen2.5-1.5b-instruct-vllm/ceval-metrology_engineer_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_mathematics_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_physics_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_chemistry_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_biology_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_mathematics_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_biology_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_physics_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_chemistry_1,qwen2.5-1.5b-instruct-vllm/ceval-veterinary_medicine_1,qwen2.5-1.5b-instruct-vllm/ceval-college_economics_1,qwen2.5-1.5b-instruct-vllm/ceval-business_administration_1,qwen2.5-1.5b-instruct-vllm/ceval-marxism_1,qwen2.5-1.5b-instruct-vllm/ceval-mao_zedong_thought_1,qwen2.5-1.5b-instruct-vllm/ceval-education_science_1,qwen2.5-1.5b-instruct-vllm/ceval-teacher_qualification_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_politics_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_geography_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_politics_1,qwen2.5-1.5b-instruct-vllm/ceval-modern_chinese_history_0,qwen2.5-1.5b-instruct-vllm/ceval-ideological_and_moral_cultivation_0,qwen2.5-1.5b-instruct-vllm/ceval-logic_0,qwen2.5-1.5b-instruct-vllm/ceval-law_0,qwen2.5-1.5b-instruct-vllm/ceval-chinese_language_and_literature_0,qwen2.5-1.5b-instruct-vllm/ceval-art_studies_0,qwen2.5-1.5b-instruct-vllm/ceval-professional_tour_guide_0,qwen2.5-1.5b-instruct-vllm/ceval-legal_professional_0,qwen2.5-1.5b-instruct-vllm/ceval-high_school_chinese_0,qwen2.5-1.5b-instruct-vllm/ceval-high_school_history_0,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_history_0,qwen2.5-1.5b-instruct-vllm/ceval-civil_servant_0,qwen2.5-1.5b-instruct-vllm/ceval-sports_science_0,qwen2.5-1.5b-instruct-vllm/ceval-plant_protection_0,qwen2.5-1.5b-instruct-vllm/ceval-basic_medicine_0,qwen2.5-1.5b-instruct-vllm/ceval-clinical_medicine_0,qwen2.5-1.5b-instruct-vllm/ceval-urban_and_rural_planner_0,qwen2.5-1.5b-instruct-vllm/ceval-accountant_0,qwen2.5-1.5b-instruct-vllm/ceval-fire_engineer_0,qwen2.5-1.5b-instruct-vllm/ceval-environmental_impact_assessment_engineer_0,qwen2.5-1.5b-instruct-vllm/ceval-tax_accountant_0,qwen2.5-1.5b-instruct-vllm/ceval-physician_0]
INFO 01-08 11:34:15 config.py:378] This model supports multiple tasks: {'embedding', 'generate'}. Defaulting to 'generate'.
INFO 01-08 11:34:15 config.py:1048] Defaulting to use mp for distributed inference
INFO 01-08 11:34:15 llm_engine.py:249] Initializing an LLM engine (vv1.3) with config: model='/workspace/models/hf_models/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='/workspace/models/hf_models/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/workspace/models/hf_models/Qwen2.5-1.5B-Instruct, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, chat_template_text_format=string, mm_processor_kwargs=None, pooler_config=None)
INFO 01-08 11:34:16 custom_cache_manager.py:17] Setting Triton cache manager to: coca_vllm.triton_utils.custom_cache_manager:CustomCacheManager
INFO 01-08 11:34:16 selector.py:135] Using Flash Attention backend.
(VllmWorkerProcess pid=69399) INFO 01-08 11:34:16 selector.py:135] Using Flash Attention backend.
(VllmWorkerProcess pid=69399) INFO 01-08 11:34:16 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] Exception in worker VllmWorkerProcess while processing method init_device.
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/coca_vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/coca_vllm/worker/worker.py", line 135, in init_device
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] torch.cuda.set_device(self.device)
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 478, in set_device
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] torch._C._cuda_setDevice(device)
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 305, in _lazy_init
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] raise RuntimeError(
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
^C(VllmWorkerProcess pid=69399) INFO 01-08 11:34:30 multiproc_worker_utils.py:240] Worker exiting
Other information
不使用vllm时可以多卡评测,使用vllm时报错:RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
The text was updated successfully, but these errors were encountered:
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0',
'GPU 0,1,2,3,4,5,6,7': 'NVIDIA A800-SXM4-80GB',
'MMEngine': '0.10.5',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 12.2, V12.2.140',
'OpenCV': '4.10.0',
'PyTorch': '2.5.1+cu124',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2024.2-Product Build 20240605 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.5.3 (Git Hash '
'66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 12.4\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 90.1\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.4, '
'CUDNN_VERSION=9.1.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON '
'-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wsuggest-override '
'-Wno-psabi -Wno-error=old-style-cast '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, '
'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, '
'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, '
'USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, '
'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n',
'Python': '3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0]',
'TorchVision': '0.20.1+cu124',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.3.9+b26c76a',
'sys.platform': 'linux',
'transformers': '4.46.1'}
Reproduces the problem - code/configuration sample
【模型配置】使用两张卡
1 from opencompass.models import HuggingFacewithChatTemplate
2
3 models = [
4 dict(
5 type=HuggingFacewithChatTemplate,
6 abbr='qwen2.5-1.5b-instruct-hf',
7 #path='Qwen/Qwen2.5-1.5B-Instruct',
8 path='/workspace/models/hf_models/Qwen2.5-1.5B-Instruct',
9 max_out_len=4096,
10 batch_size=8,
11 run_cfg=dict(num_gpus=2),
12 )
13 ]
Reproduces the problem - command or script
python run.py --datasets ceval_gen --models hf_qwen2_5_1_5b_instruct --debug -a vllm
Reproduces the problem - error message
01/08 11:34:07 - OpenCompass - INFO - Loading hf_qwen2_5_1_5b_instruct: configs/models/hf_qwen2_5_1_5b_instruct.py
01/08 11:34:07 - OpenCompass - INFO - Transforming qwen2.5-1.5b-instruct-hf to vllm
01/08 11:34:07 - OpenCompass - INFO - Loading example: /usr/local/lib/python3.10/dist-packages/opencompass/configs/./summarizers/example.py
01/08 11:34:07 - OpenCompass - INFO - Current exp folder: outputs/default/20250108_113407
01/08 11:34:07 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
01/08 11:34:07 - OpenCompass - INFO - Partitioned into 2 tasks.
01/08 11:34:11 - OpenCompass - WARNING - Only use 2 GPUs for total 8 available GPUs in debug mode.
01/08 11:34:11 - OpenCompass - INFO - Task [qwen2.5-1.5b-instruct-vllm/ceval-computer_network_0,qwen2.5-1.5b-instruct-vllm/ceval-operating_system_0,qwen2.5-1.5b-instruct-vllm/ceval-computer_architecture_0,qwen2.5-1.5b-instruct-vllm/ceval-college_programming_0,qwen2.5-1.5b-instruct-vllm/ceval-college_physics_0,qwen2.5-1.5b-instruct-vllm/ceval-college_chemistry_0,qwen2.5-1.5b-instruct-vllm/ceval-advanced_mathematics_0,qwen2.5-1.5b-instruct-vllm/ceval-probability_and_statistics_0,qwen2.5-1.5b-instruct-vllm/ceval-discrete_mathematics,qwen2.5-1.5b-instruct-vllm/ceval-electrical_engineer_1,qwen2.5-1.5b-instruct-vllm/ceval-metrology_engineer_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_mathematics_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_physics_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_chemistry_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_biology_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_mathematics_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_biology_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_physics_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_chemistry_1,qwen2.5-1.5b-instruct-vllm/ceval-veterinary_medicine_1,qwen2.5-1.5b-instruct-vllm/ceval-college_economics_1,qwen2.5-1.5b-instruct-vllm/ceval-business_administration_1,qwen2.5-1.5b-instruct-vllm/ceval-marxism_1,qwen2.5-1.5b-instruct-vllm/ceval-mao_zedong_thought_1,qwen2.5-1.5b-instruct-vllm/ceval-education_science_1,qwen2.5-1.5b-instruct-vllm/ceval-teacher_qualification_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_politics_1,qwen2.5-1.5b-instruct-vllm/ceval-high_school_geography_1,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_politics_1,qwen2.5-1.5b-instruct-vllm/ceval-modern_chinese_history_0,qwen2.5-1.5b-instruct-vllm/ceval-ideological_and_moral_cultivation_0,qwen2.5-1.5b-instruct-vllm/ceval-logic_0,qwen2.5-1.5b-instruct-vllm/ceval-law_0,qwen2.5-1.5b-instruct-vllm/ceval-chinese_language_and_literature_0,qwen2.5-1.5b-instruct-vllm/ceval-art_studies_0,qwen2.5-1.5b-instruct-vllm/ceval-professional_tour_guide_0,qwen2.5-1.5b-instruct-vllm/ceval-legal_professional_0,qwen2.5-1.5b-instruct-vllm/ceval-high_school_chinese_0,qwen2.5-1.5b-instruct-vllm/ceval-high_school_history_0,qwen2.5-1.5b-instruct-vllm/ceval-middle_school_history_0,qwen2.5-1.5b-instruct-vllm/ceval-civil_servant_0,qwen2.5-1.5b-instruct-vllm/ceval-sports_science_0,qwen2.5-1.5b-instruct-vllm/ceval-plant_protection_0,qwen2.5-1.5b-instruct-vllm/ceval-basic_medicine_0,qwen2.5-1.5b-instruct-vllm/ceval-clinical_medicine_0,qwen2.5-1.5b-instruct-vllm/ceval-urban_and_rural_planner_0,qwen2.5-1.5b-instruct-vllm/ceval-accountant_0,qwen2.5-1.5b-instruct-vllm/ceval-fire_engineer_0,qwen2.5-1.5b-instruct-vllm/ceval-environmental_impact_assessment_engineer_0,qwen2.5-1.5b-instruct-vllm/ceval-tax_accountant_0,qwen2.5-1.5b-instruct-vllm/ceval-physician_0]
INFO 01-08 11:34:15 config.py:378] This model supports multiple tasks: {'embedding', 'generate'}. Defaulting to 'generate'.
INFO 01-08 11:34:15 config.py:1048] Defaulting to use mp for distributed inference
INFO 01-08 11:34:15 llm_engine.py:249] Initializing an LLM engine (vv1.3) with config: model='/workspace/models/hf_models/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='/workspace/models/hf_models/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/workspace/models/hf_models/Qwen2.5-1.5B-Instruct, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, chat_template_text_format=string, mm_processor_kwargs=None, pooler_config=None)
INFO 01-08 11:34:16 custom_cache_manager.py:17] Setting Triton cache manager to: coca_vllm.triton_utils.custom_cache_manager:CustomCacheManager
INFO 01-08 11:34:16 selector.py:135] Using Flash Attention backend.
(VllmWorkerProcess pid=69399) INFO 01-08 11:34:16 selector.py:135] Using Flash Attention backend.
(VllmWorkerProcess pid=69399) INFO 01-08 11:34:16 multiproc_worker_utils.py:215] Worker ready; awaiting tasks
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] Exception in worker VllmWorkerProcess while processing method init_device.
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] Traceback (most recent call last):
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/coca_vllm/executor/multiproc_worker_utils.py", line 223, in _run_worker_process
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] output = executor(*args, **kwargs)
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/coca_vllm/worker/worker.py", line 135, in init_device
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] torch.cuda.set_device(self.device)
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 478, in set_device
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] torch._C._cuda_setDevice(device)
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] File "/usr/local/lib/python3.10/dist-packages/torch/cuda/init.py", line 305, in _lazy_init
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] raise RuntimeError(
(VllmWorkerProcess pid=69399) ERROR 01-08 11:34:16 multiproc_worker_utils.py:229] RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
^C(VllmWorkerProcess pid=69399) INFO 01-08 11:34:30 multiproc_worker_utils.py:240] Worker exiting
Other information
不使用vllm时可以多卡评测,使用vllm时报错:RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
The text was updated successfully, but these errors were encountered: