Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Moondream v2 #2711

Open
cryptoquick opened this issue Jan 12, 2025 · 0 comments
Open

Update Moondream v2 #2711

cryptoquick opened this issue Jan 12, 2025 · 0 comments

Comments

@cryptoquick
Copy link

Excellent work on this project, the moondream example works great. One thing I'd like to try is the latest version of moondream. In the current readme, that's for "2025-01-09" in "vikhyatk/moondream2".

I've tried to update it myself, but I reached my limit. The furthest I got, and hopefully this is helpful information, is printing the new vars:

avx: true, neon: false, simd128: false, f16c: true
temp: -1.25 repeat-penalty: 1.00 repeat-last-n: 64
retrieved the files in 27.875484ms
model_file: "/home/hunter/.cache/huggingface/hub/models--vikhyatk--moondream2/snapshots/adcbcd1a6d27fc19974b18dc128eb51ef6837879/model.safetensors"
Tensors found in safetensors file:
  model.vision.blocks.9.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.proj_mlp.fc1.bias => shape: [8192], dtype: F16
  model.region.coord_encoder.weight => shape: [2048, 256], dtype: F16
  model.text.blocks.0.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.16.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.2.mlp.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.23.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.24.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.3.ln1.bias => shape: [1152], dtype: F16
  model.vision.post_ln.weight => shape: [1152], dtype: F16
  model.vision.blocks.23.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.text.blocks.0.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.text.blocks.3.attn.qkv.bias => shape: [6144], dtype: F16
  model.text.blocks.1.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.16.attn.qkv.bias => shape: [3456], dtype: F16
  model.text.blocks.16.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.12.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.1.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.26.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.26.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.23.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.14.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.0.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.19.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.16.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.proj_mlp.fc2.bias => shape: [2048], dtype: F16
  model.text.blocks.16.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.vision.blocks.18.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.4.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.21.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.10.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.16.attn.proj.bias => shape: [2048], dtype: F16
  model.text.blocks.17.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.18.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.20.attn.proj.bias => shape: [2048], dtype: F16
  model.text.blocks.9.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.8.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.18.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.17.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.11.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.3.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.17.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.6.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.24.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.7.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.24.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.text.blocks.14.mlp.fc2.bias => shape: [2048], dtype: F16
  model.text.blocks.15.attn.proj.bias => shape: [2048], dtype: F16
  model.text.blocks.3.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.text.blocks.1.attn.qkv.bias => shape: [6144], dtype: F16
  model.text.blocks.5.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.text.blocks.21.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.vision.blocks.22.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.7.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.16.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.23.attn.qkv.bias => shape: [3456], dtype: F16
  model.text.blocks.13.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.17.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.20.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.13.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.26.ln2.bias => shape: [1152], dtype: F16
  model.text.blocks.11.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.text.blocks.2.ln.weight => shape: [2048], dtype: F16
  model.text.blocks.17.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.text.blocks.20.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.text.blocks.22.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.vision.blocks.1.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.12.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.text.blocks.2.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.10.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.12.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.13.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.15.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.8.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.2.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.text.blocks.4.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.text.blocks.18.attn.proj.bias => shape: [2048], dtype: F16
  model.vision.blocks.8.ln2.weight => shape: [1152], dtype: F16
  model.text.blocks.3.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.text.post_ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.18.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.6.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.1.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.10.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.10.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.12.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.19.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.4.attn.proj.bias => shape: [2048], dtype: F16
  model.vision.blocks.25.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.4.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.20.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.1.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.22.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.18.attn.qkv.bias => shape: [3456], dtype: F16
  model.text.blocks.21.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.text.blocks.12.attn.qkv.bias => shape: [6144], dtype: F16
  model.text.blocks.14.attn.proj.bias => shape: [2048], dtype: F16
  model.text.blocks.2.attn.proj.bias => shape: [2048], dtype: F16
  model.vision.blocks.0.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.text.blocks.7.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.7.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.text.blocks.12.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.vision.blocks.21.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.22.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.3.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.22.attn.qkv.bias => shape: [3456], dtype: F16
  model.text.blocks.4.mlp.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.5.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.8.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.13.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.text.blocks.1.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.23.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.19.mlp.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.11.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.17.ln2.bias => shape: [1152], dtype: F16
  model.text.blocks.16.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.0.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.2.ln2.weight => shape: [1152], dtype: F16
  model.text.blocks.2.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.24.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.23.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.10.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.14.ln.weight => shape: [2048], dtype: F16
  model.text.blocks.10.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.4.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.6.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.0.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.10.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.text.blocks.22.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.0.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.9.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.13.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.text.blocks.6.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.24.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.12.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.15.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.3.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.25.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.19.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.region.size_encoder.bias => shape: [2048], dtype: F16
  model.text.blocks.9.mlp.fc2.bias => shape: [2048], dtype: F16
  model.text.blocks.0.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.5.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.22.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.26.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.11.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.7.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.17.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.1.ln1.bias => shape: [1152], dtype: F16
  model.text.blocks.19.mlp.fc2.bias => shape: [2048], dtype: F16
  model.text.blocks.7.mlp.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.24.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.7.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.5.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.12.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.1.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.text.blocks.17.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.10.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.21.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.7.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.22.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.9.ln2.weight => shape: [1152], dtype: F16
  model.text.blocks.19.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.text.blocks.6.mlp.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.13.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.13.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.14.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.19.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.text.blocks.11.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.14.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.2.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.14.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.10.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.17.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.17.attn.qkv.bias => shape: [6144], dtype: F16
  model.region.coord_decoder.fc2.bias => shape: [1024], dtype: F16
  model.region.size_decoder.fc2.weight => shape: [2048, 8192], dtype: F16
  model.text.blocks.17.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.vision.blocks.0.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.10.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.17.attn.proj.bias => shape: [2048], dtype: F16
  model.vision.blocks.21.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.3.attn.qkv.bias => shape: [3456], dtype: F16
  model.text.blocks.14.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.5.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.text.blocks.9.mlp.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.5.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.8.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.15.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.8.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.proj_mlp.fc1.weight => shape: [8192, 2304], dtype: F16
  model.vision.blocks.21.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.24.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.15.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.proj_mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.11.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.2.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.11.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.20.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.20.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.3.ln.weight => shape: [2048], dtype: F16
  model.text.blocks.8.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.16.ln2.bias => shape: [1152], dtype: F16
  model.text.blocks.22.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.region.coord_encoder.bias => shape: [2048], dtype: F16
  model.text.blocks.16.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.18.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.12.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.0.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.text.blocks.18.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.21.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.3.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.4.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.1.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.7.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.14.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.14.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.15.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.16.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.17.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.12.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.4.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.4.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.19.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.0.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.6.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.26.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.9.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.4.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.text.blocks.0.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.11.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.17.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.3.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.6.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.23.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.15.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.15.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.8.attn.proj.bias => shape: [2048], dtype: F16
  model.vision.blocks.23.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.12.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.6.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.7.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.8.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.text.blocks.11.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.7.ln2.weight => shape: [1152], dtype: F16
  model.text.blocks.4.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.10.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.text.blocks.3.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.14.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.20.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.3.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.25.attn.qkv.bias => shape: [3456], dtype: F16
  model.text.blocks.5.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.11.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.text.blocks.5.ln.weight => shape: [2048], dtype: F16
  model.text.wte => shape: [51200, 2048], dtype: F16
  model.vision.blocks.4.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.5.ln1.bias => shape: [1152], dtype: F16
  model.text.blocks.6.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.11.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.16.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.9.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.6.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.12.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.8.ln2.bias => shape: [1152], dtype: F16
  model.text.blocks.11.mlp.fc2.bias => shape: [2048], dtype: F16
  model.text.blocks.3.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.8.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.vision.blocks.14.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.24.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.1.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.7.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.3.attn.proj.bias => shape: [2048], dtype: F16
  model.region.size_encoder.weight => shape: [2048, 512], dtype: F16
  model.vision.blocks.6.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.22.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.9.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.text.blocks.22.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.16.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.post_ln.bias => shape: [1152], dtype: F16
  model.text.blocks.1.ln.weight => shape: [2048], dtype: F16
  model.region.coord_decoder.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.17.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.5.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.18.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.10.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.14.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.14.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.5.ln2.bias => shape: [1152], dtype: F16
  model.text.blocks.22.attn.qkv.bias => shape: [6144], dtype: F16
  model.text.blocks.6.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.25.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.text.blocks.7.attn.qkv.bias => shape: [6144], dtype: F16
  model.text.blocks.23.attn.proj.bias => shape: [2048], dtype: F16
  model.text.blocks.13.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.19.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.14.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.13.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.2.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.21.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.10.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.20.ln2.weight => shape: [1152], dtype: F16
  model.text.blocks.22.mlp.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.14.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.20.ln1.bias => shape: [1152], dtype: F16
  model.text.blocks.10.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.13.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.5.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.13.ln1.bias => shape: [1152], dtype: F16
  model.text.blocks.8.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.5.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.18.ln1.bias => shape: [1152], dtype: F16
  model.text.blocks.15.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.17.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.23.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.13.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.13.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.23.ln1.bias => shape: [1152], dtype: F16
  model.region.size_decoder.fc1.weight => shape: [8192, 2048], dtype: F16
  model.text.blocks.0.attn.proj.bias => shape: [2048], dtype: F16
  model.vision.blocks.23.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.5.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.21.attn.proj.bias => shape: [2048], dtype: F16
  model.text.blocks.1.attn.proj.bias => shape: [2048], dtype: F16
  model.region.size_decoder.fc2.bias => shape: [2048], dtype: F16
  model.text.blocks.1.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.26.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.8.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.4.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.11.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.text.blocks.15.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.19.attn.proj.bias => shape: [2048], dtype: F16
  model.vision.blocks.22.ln2.bias => shape: [1152], dtype: F16
  model.text.blocks.13.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.text.blocks.20.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.22.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.7.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.text.blocks.9.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.6.ln.weight => shape: [2048], dtype: F16
  model.text.blocks.23.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.text.blocks.6.attn.proj.bias => shape: [2048], dtype: F16
  model.text.post_ln.weight => shape: [2048], dtype: F16
  model.text.blocks.15.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.15.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.14.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.11.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.1.ln2.bias => shape: [1152], dtype: F16
  model.text.blocks.19.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.12.mlp.fc2.bias => shape: [2048], dtype: F16
  model.text.blocks.18.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.1.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.text.blocks.13.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.20.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.18.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.text.blocks.4.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.20.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.3.ln2.weight => shape: [1152], dtype: F16
  model.text.blocks.5.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.11.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.15.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.22.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.4.ln2.weight => shape: [1152], dtype: F16
  model.text.blocks.23.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.text.blocks.0.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.17.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.text.blocks.22.attn.proj.bias => shape: [2048], dtype: F16
  model.vision.blocks.13.ln2.weight => shape: [1152], dtype: F16
  model.text.blocks.21.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.text.blocks.5.mlp.fc2.bias => shape: [2048], dtype: F16
  model.text.blocks.21.attn.qkv.bias => shape: [6144], dtype: F16
  model.text.blocks.10.attn.proj.bias => shape: [2048], dtype: F16
  model.text.blocks.12.attn.proj.bias => shape: [2048], dtype: F16
  model.text.blocks.15.mlp.fc2.bias => shape: [2048], dtype: F16
  model.text.blocks.20.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.14.mlp.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.3.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.region.size_decoder.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.17.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.8.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.22.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.23.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.18.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.25.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.3.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.3.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.12.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.20.mlp.fc2.bias => shape: [2048], dtype: F16
  model.text.blocks.8.mlp.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.16.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.16.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.9.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.9.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.0.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.pos_emb => shape: [1, 729, 1152], dtype: F16
  model.vision.blocks.25.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.7.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.2.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.19.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.18.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.16.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.19.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.2.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.21.ln2.weight => shape: [1152], dtype: F16
  model.vision.patch_emb.bias => shape: [1152], dtype: F16
  model.text.blocks.14.attn.qkv.bias => shape: [6144], dtype: F16
  model.text.blocks.8.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.14.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.12.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.0.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.20.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.24.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.10.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.23.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.0.ln1.bias => shape: [1152], dtype: F16
  model.text.blocks.4.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.14.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.11.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.5.mlp.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.7.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.7.ln2.bias => shape: [1152], dtype: F16
  model.text.blocks.15.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.vision.blocks.19.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.11.attn.proj.bias => shape: [2048], dtype: F16
  model.vision.blocks.12.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.15.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.0.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.21.ln.weight => shape: [2048], dtype: F16
  model.text.blocks.6.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.25.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.patch_emb.weight => shape: [1152, 588], dtype: F16
  model.vision.blocks.23.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.19.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.25.ln1.bias => shape: [1152], dtype: F16
  model.text.blocks.9.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.text.blocks.15.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.4.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.5.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.1.ln2.weight => shape: [1152], dtype: F16
  model.text.blocks.7.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.24.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.15.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.2.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.text.lm_head.weight => shape: [51200, 2048], dtype: F16
  model.vision.blocks.2.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.3.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.2.ln1.bias => shape: [1152], dtype: F16
  model.text.blocks.10.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.12.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.26.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.25.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.8.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.text.blocks.23.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.text.blocks.12.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.11.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.25.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.21.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.4.attn.qkv.bias => shape: [3456], dtype: F16
  model.text.blocks.19.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.8.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.6.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.21.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.12.ln2.bias => shape: [1152], dtype: F16
  model.text.lm_head.bias => shape: [51200], dtype: F16
  model.text.blocks.18.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.16.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.19.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.16.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.9.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.10.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.2.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.8.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.10.mlp.fc1.bias => shape: [8192], dtype: F16
  model.vision.blocks.1.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.23.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.13.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.text.blocks.2.attn.qkv.bias => shape: [6144], dtype: F16
  model.text.blocks.8.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.5.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.9.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.20.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.9.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.text.blocks.11.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.vision.blocks.3.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.19.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.vision.blocks.18.ln2.bias => shape: [1152], dtype: F16
  model.text.blocks.11.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.6.mlp.fc2.bias => shape: [1152], dtype: F16
  model.text.blocks.13.attn.proj.bias => shape: [2048], dtype: F16
  model.text.blocks.6.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.26.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.4.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.18.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.5.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.22.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.region.coord_decoder.fc2.weight => shape: [1024, 8192], dtype: F16
  model.vision.blocks.26.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.8.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.18.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.20.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.20.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.26.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.text.blocks.9.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.vision.blocks.19.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.23.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.6.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.19.ln.weight => shape: [2048], dtype: F16
  model.text.blocks.18.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.21.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.0.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.25.ln2.weight => shape: [1152], dtype: F16
  model.region.coord_features => shape: [1, 128], dtype: F16
  model.vision.blocks.0.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.3.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.13.attn.qkv.bias => shape: [6144], dtype: F16
  model.vision.blocks.23.ln2.weight => shape: [1152], dtype: F16
  model.text.blocks.18.mlp.fc2.bias => shape: [2048], dtype: F16
  model.text.blocks.7.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.vision.blocks.13.mlp.fc2.bias => shape: [1152], dtype: F16
  model.vision.blocks.24.ln2.weight => shape: [1152], dtype: F16
  model.vision.blocks.4.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.text.blocks.13.mlp.fc1.bias => shape: [8192], dtype: F16
  model.text.blocks.14.ln.bias => shape: [2048], dtype: F16
  model.text.blocks.23.attn.qkv.bias => shape: [6144], dtype: F16
  model.text.blocks.18.ln.weight => shape: [2048], dtype: F16
  model.text.blocks.7.attn.proj.bias => shape: [2048], dtype: F16
  model.vision.blocks.10.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.11.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.2.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.25.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.text.blocks.5.attn.proj.bias => shape: [2048], dtype: F16
  model.vision.blocks.13.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.4.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.21.attn.proj.bias => shape: [1152], dtype: F16
  model.text.blocks.20.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.vision.blocks.20.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.text.blocks.17.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.26.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.15.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.6.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.text.blocks.16.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.region.coord_decoder.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.24.ln2.bias => shape: [1152], dtype: F16
  model.vision.blocks.17.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.7.attn.proj.bias => shape: [1152], dtype: F16
  model.vision.blocks.18.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.16.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.22.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.text.blocks.16.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.vision.blocks.1.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.6.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.26.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.21.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.15.ln.weight => shape: [2048], dtype: F16
  model.text.blocks.19.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.text.blocks.12.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.text.blocks.9.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.17.ln1.weight => shape: [1152], dtype: F16
  model.text.blocks.15.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.text.blocks.8.mlp.fc2.bias => shape: [2048], dtype: F16
  model.vision.blocks.2.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.text.blocks.1.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.22.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.9.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.11.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.6.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.2.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.10.ln1.bias => shape: [1152], dtype: F16
  model.vision.blocks.12.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.text.blocks.20.ln.weight => shape: [2048], dtype: F16
  model.text.blocks.2.attn.qkv.weight => shape: [6144, 2048], dtype: F16
  model.text.blocks.1.ln.bias => shape: [2048], dtype: F16
  model.vision.blocks.22.ln2.weight => shape: [1152], dtype: F16
  model.text.blocks.15.mlp.fc2.weight => shape: [2048, 8192], dtype: F16
  model.text.blocks.21.attn.proj.weight => shape: [2048, 2048], dtype: F16
  model.text.blocks.0.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.17.attn.qkv.bias => shape: [3456], dtype: F16
  model.text.blocks.7.ln.weight => shape: [2048], dtype: F16
  model.vision.blocks.19.attn.proj.weight => shape: [1152, 1152], dtype: F16
  model.vision.blocks.9.mlp.fc1.weight => shape: [4304, 1152], dtype: F16
  model.vision.blocks.0.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.16.ln1.bias => shape: [1152], dtype: F16
  model.text.blocks.10.mlp.fc1.weight => shape: [8192, 2048], dtype: F16
  model.vision.blocks.21.mlp.fc2.weight => shape: [1152, 4304], dtype: F16
  model.vision.blocks.4.ln1.weight => shape: [1152], dtype: F16
  model.vision.blocks.20.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.5.mlp.fc1.bias => shape: [4304], dtype: F16
  model.vision.blocks.2.mlp.fc1.bias => shape: [4304], dtype: F16
  model.text.blocks.9.attn.proj.bias => shape: [2048], dtype: F16
  model.region.size_features => shape: [2, 256], dtype: F16
  model.vision.blocks.9.attn.qkv.bias => shape: [3456], dtype: F16
  model.vision.blocks.1.attn.qkv.weight => shape: [3456, 1152], dtype: F16
  model.vision.blocks.21.attn.proj.weight => shape: [1152, 1152], dtype: F16
Error: cannot find tensor text_model.transformer.embd.wte.weight

Let me know if there's anything else I can try. Otherwise, I hope this helps whoever might be able to do this, like @LaurentMazare.

I used this code snippet:

        use safetensors::SafeTensors;

        let data = std::fs::read(&model_file)?;
        let tensors = SafeTensors::deserialize(&data)
            .map_err(|e| std::io::Error::new(std::io::ErrorKind::Other, e))?;

        println!("Tensors found in safetensors file:");
        for name in tensors.names() {
            let info = tensors.tensor(name).unwrap();
            println!(
                "  {} => shape: {:?}, dtype: {:?}",
                name,
                info.shape(),
                info.dtype()
            );
        }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant