Replies: 3 comments 6 replies
-
You can get the correct mmproj file from here: https://huggingface.co/koboldcpp/mmproj/tree/main For mistral, use this one: https://huggingface.co/koboldcpp/mmproj/resolve/main/mistral-7b-mmproj-v1.5-Q4_1.gguf |
Beta Was this translation helpful? Give feedback.
5 replies
-
yes ok, and where i put the files or how i start with mmproj? ;) i use the EXE file |
Beta Was this translation helpful? Give feedback.
1 reply
-
i see ... thx for fast help |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
win10 and RTX GPU
kobold 1.61.2
load a lm model all running fine
load any of my 5 llava all running into error
...
Using automatic RoPE scaling. If the model has customized RoPE settings, they will be used directly instead!
System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 |
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from F:\chatGPT_go\models\neuralhermes-mistral-7b-voÚ¡üýÂllm_load_vocab: special tokens definition check successful ( 261/32002 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32002
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attm = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = unknown, may not work
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.78 GiB (5.67 BPW)
llm_load_print_meta: general.name = content
llm_load_print_meta: BOS token = 1 '
'llm_load_print_meta: EOS token = 32000 '<|im_end|>'
llm_load_print_meta: UNK token = 0 ''
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.26 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size = 85.94 MiB
llm_load_tensors: CUDA0 buffer size = 4807.06 MiB
...................................................................................................
Automatic RoPE Scaling: Using (scale:1.000, base:10000.0).
llama_new_context_with_model: n_ctx = 2128
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 266.00 MiB
llama_new_context_with_model: KV self size = 266.00 MiB, K (f16): 133.00 MiB, V (f16): 133.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 62.50 MiB
llama_new_context_with_model: CUDA0 compute buffer size = 169.16 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 12.16 MiB
llama_new_context_with_model: graph splits: 2
Attempting to apply Multimodal Projector: F:\jmage_models\ggml-model-q5_k.gguf
key general.description not found in file
Traceback (most recent call last):
File "koboldcpp.py", line 3094, in
File "koboldcpp.py", line 2846, in main
File "koboldcpp.py", line 396, in load_model
OSError: [WinError -529697949] Windows Error 0xe06d7363
[1836] Failed to execute script 'koboldcpp' due to unhandled exception!
f:\koboldcpp>
...
all model runn with oobadooga and one of them with Jan
Beta Was this translation helpful? Give feedback.
All reactions