[BUG]Load model #542

thangnv02 · 2025-02-03T02:05:56Z

When i try to load deepseek model:
Attemp 1:

chat = VLLM(
            model="deepseek-ai/DeepSeek-R1",
            trust_remote_code=True,
            max_new_tokens=128,
            temperature=0,    
            device=DEVICE,
            vllm_kwargs={
                # "quantization": "awq",
                "max_model_len": 1024,
                "gpu_memory_utilization": 0.95 # Default value
            }    
        )

ERROR 1: ValidationError: 1 validation error for VLLM

  Value error, Model architectures ['DeepseekV3ForCausalLM'] are not supported for now. Supported architectures: ['AquilaModel', 'AquilaForCausalLM', 'BaiChuanForCausalLM', 'BaichuanForCausalLM', 'BloomForCausalLM', 'ChatGLMModel', 'ChatGLMForConditionalGeneration', 'CohereForCausalLM', 'DbrxForCausalLM', 'DeciLMForCausalLM', 'DeepseekForCausalLM', 'DeepseekV2ForCausalLM', 'ExaoneForCausalLM', 'FalconForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTJForCausalLM', 'GPTNeoXForCausalLM', 'InternLMForCausalLM', 'InternLM2ForCausalLM', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'QuantMixtralForCausalLM', 'MptForCausalLM', 'MPTForCausalLM', 'MiniCPMForCausalLM', 'NemotronForCausalLM', 'OlmoForCausalLM', 'OPTForCausalLM', 'OrionForCausalLM', 'PersimmonForCausalLM', 'PhiForCausalLM', 'Phi3ForCausalLM', 'PhiMoEForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'Qwen2VLForConditionalGeneration', 'RWForCausalLM', 'StableLMEpochForCausalLM', 'StableLmForCausalLM', 'Starcoder2ForCausalLM', 'ArcticForCausalLM', 'XverseForCausalLM', 'Phi3SmallForCausalLM', 'MedusaModel', 'EAGLEModel', 'MLPSpeculatorPreTrainedModel', 'JambaForCausalLM', 'GraniteForCausalLM', 'MistralModel', 'Blip2ForConditionalGeneration', 'ChameleonForConditionalGeneration', 'FuyuForCausalLM', 'InternVLChatModel', 'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'LlavaNextVideoForConditionalGeneration', 'MiniCPMV', 'PaliGemmaForConditionalGeneration', 'Phi3VForCausalLM', 'PixtralForConditionalGeneration', 'QWenLMHeadModel', 'UltravoxModel', 'BartModel', 'BartForConditionalGeneration'] [type=value_error, input_value={'model': 'deepseek-ai/De...': None, 'client': None}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error

Attemp 2:

model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, cache_dir = CACHE_DIR)

ERROR 2:

ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']

Why it happends and how to solve

AleemIqbal · 2025-02-03T02:58:41Z

The errors you’re seeing occur because the model’s configuration isn’t aligned with what your current tooling (either VLLM or Hugging Face’s Transformers with quantization support) expects. Here’s what’s happening and some suggestions on how to proceed:

Model Architecture Mismatch (Attempt 1)
What’s happening:
When you try to load deepseek-ai/DeepSeek-R1 with VLLM, the library checks the model’s architecture against a whitelist of supported architectures. In the error message, you can see that the supported architectures include:

DeepseekForCausalLM
DeepseekV2ForCausalLM
But your model’s architecture is reported as DeepseekV3ForCausalLM (or possibly something else not on the supported list). This mismatch leads to a validation error.

How to solve it:

Check for a Supported Version: If you’re tied to DeepSeek models, see if there’s a version of the model (for example, one with the architecture DeepseekForCausalLM or DeepseekV2ForCausalLM) that is officially supported by VLLM.
Update or Patch VLLM: If you need to use the R1 (or V3) version, you might need to wait for VLLM to add support or consider contributing a patch to add that architecture. You can check the VLLM repository or documentation for any updates regarding support for newer DeepSeek versions.
Contact the Maintainers: Sometimes, the model provider or the VLLM maintainers might have guidance or beta support for newer architectures. Check their GitHub issues or discussion forums.

Quantization Type Error (Attempt 2)
What’s happening:
When loading the model with AutoModelForCausalLM.from_pretrained (with trust_remote_code=True), you hit an error stating:

ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', ...]
This means that the model’s configuration or code specifies a quantization type (fp8) that is not recognized by the quantization framework you are using. The quantization engine supports a fixed set of types, and fp8 isn’t one of them.

How to solve it:

Modify the Configuration:
If you’re comfortable editing the model’s configuration (or cloning the repository and modifying the source), you could change the quantization type from fp8 to one of the supported types (for example, awq) if that change is acceptable for your use case.
Disable Quantization:
If quantization isn’t essential for your experiment, you might try to disable quantization or avoid loading quantized weights altogether.

Wait for Official Support:
It’s possible that the model provider is planning to add support for fp8 or that an update to the quantization framework will include fp8. Keep an eye on the relevant repositories (model repo, Hugging Face Transformers, or quantization libraries like BitsAndBytes) for updates.

Consult Documentation/Community:
The error message points to https://errors.pydantic.dev/2.10/v/value_error for more info. Also, checking discussions in the model’s repository or forums (like the Hugging Face forums) might reveal workarounds used by others facing the same issue.

quanshengjia · 2025-02-03T16:55:52Z

I got below error when I try to laod deepseek model for testing: Traceback (most recent call last):
File "C:\Users\e0101707\deepseek-v3\deepseek_v3_generate_poem.py", line 71, in
generate_poem()
File "C:\Users\e0101707\deepseek-v3\deepseek_v3_generate_poem.py", line 42, in generate_poem
model = AutoModelForCausalLM.from_pretrained(model_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\site-packages\transformers\models\auto\auto_factory.py", line 526, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\site-packages\transformers\models\auto\configuration_auto.py", line 1057, in from_pretrained
trust_remote_code = resolve_trust_remote_code(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python312\Lib\site-packages\transformers\dynamic_module_utils.py", line 665, in resolve_trust_remote_code
raise ValueError(
ValueError: The repository for ./deepseek-v3 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/./deepseek-v3.
Please pass the argument trust_remote_code=True to allow custom code to be run. The python script is attached here: import os
import requests
import ssl
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

def download_file(url, save_path):
try:
response = requests.get(url, verify=False)
response.raise_for_status()
with open(save_path, 'wb') as f:
f.write(response.content)
print("File downloaded successfully.")
except requests.exceptions.RequestException as e:
print(f"Error downloading the file: {e}")

def download_deepseek_model_files():
base_url = "https://huggingface.co/deepseek-ai/DeepSeek-V3/resolve/main"
save_dir = "deepseek-v3"
os.makedirs(save_dir, exist_ok=True)
files_to_download = {
"pytorch_model.bin": "pytorch_model.bin",
"config.json": "config.json",
"tokenizer_config.json": "tokenizer_config.json",
"vocab.json": "vocab.json",
"merges.txt": "merges.txt"
}
for file_name, save_name in files_to_download.items():
file_url = f"{base_url}/{file_name}"
save_path = os.path.join(save_dir, save_name)
download_file(file_url, save_path)

def generate_poem():
# Create an unverified SSL context
ssl._create_default_https_context = ssl._create_unverified_context

# Download DeepSeek-V3 model files
download_deepseek_model_files()

# Load pre-trained DeepSeek-V3 model and tokenizer from local files
model_dir = "./deepseek-v3"
model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)

# Prompt for poem generation
prompt = "写一首关于自然美的诗。"

# Encode the prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt")
attention_mask = torch.ones(input_ids.shape, dtype=torch.long)

# Generate text
with torch.no_grad():
    outputs = model.generate(
        input_ids, 
        attention_mask=attention_mask, 
        max_length=100, 
        num_return_sequences=1, 
        no_repeat_ngram_size=2, 
        top_p=0.95, 
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id,
        do_sample=True
    )

# Decode the generated text
poem = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(poem)

if name == "main":
generate_poem()

csyslcp · 2025-02-05T11:00:31Z

Same here. Use AutoModelForCausalLM to load model and get the error. Did you fix it?

GeeeekExplorer marked this as a duplicate of #558 Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]Load model #542

[BUG]Load model #542

thangnv02 commented Feb 3, 2025

AleemIqbal commented Feb 3, 2025

quanshengjia commented Feb 3, 2025

csyslcp commented Feb 5, 2025

[BUG]Load model #542

[BUG]Load model #542

Comments

thangnv02 commented Feb 3, 2025

AleemIqbal commented Feb 3, 2025

quanshengjia commented Feb 3, 2025

csyslcp commented Feb 5, 2025