Failing to run tutorial in NanoVLM, subprocess build failed #1918

yyakupoglu · 2024-12-09T13:45:36Z

I was following the tutorial on NanoVLM, https://www.jetson-ai-lab.com/tutorial_nano-vlm.html, but I couldn't manage to solve it. I am using Jetson Orin Nano 8GB and using Efficient-Large-Model/VILA-2.7b.

Input:
$ jetson-containers run $(autotag nano_llm) python3 -m nano_llm.chat --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-context-len 256 --max-new-tokens 32

Output:

Namespace(packages=['nano_llm'], prefer=['local', 'registry', 'build'], disable=[''], user='dustynv', output='/tmp/autotag', quiet=False, verbose=False)
-- L4T_VERSION=36.3.0 JETPACK_VERSION=6.0 CUDA_VERSION=12.2
-- Finding compatible container image for ['nano_llm']
dustynv/nano_llm:r36.2.0
localuser:root being added to access control list

sudo docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /var/run/dbus:/var/run/dbus --volume /var/run/avahi-daemon/socket:/var/run/avahi-daemon/socket --volume /var/run/docker.sock:/var/run/docker.sock --volume /home/yigit/jetson-containers/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:1 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/i2c-0 --device /dev/i2c-1 --device /dev/i2c-2 --device /dev/i2c-4 --device /dev/i2c-5 --device /dev/i2c-7 --device /dev/i2c-9 dustynv/nano_llm:r36.2.0 python3 -m nano_llm.chat --api=mlc --model Efficient-Large-Model/VILA-2.7b --max-context-len 256 --max-new-tokens 32
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using TRANSFORMERS_CACHE is deprecated and will be removed in v5 of Transformers. Use HF_HOME instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Fetching 10 files: 100%|█████████████████████| 10/10 [00:00<00:00, 62601.55it/s]
Fetching 12 files: 100%|██████████████████████| 12/12 [00:00<00:00, 9336.24it/s]
13:40:23 | INFO | loading /data/models/huggingface/models--Efficient-Large-Model--VILA-2.7b/snapshots/2ed82105eefd5926cccb46af9e71b0ca77f12704 with MLC
13:40:26 | INFO | running MLC quantization:

python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA-2.7b-ctx256 --use-safetensors

Using path "/data/models/mlc/dist/models/VILA-2.7b" for model "VILA-2.7b"
Target configured: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_87 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Get old param: 0%| | 0/197 [00:00<?, ?tensors/sStart computing and quantizing weights... This may take a while.00<?, ?tensors/s]
Get old param: 1%|▏ | 2/197 [00:02<03:51, 1.19s/tensors]Traceback (most recent call last): | 1/327 [00:02<15:24, 2.84s/tensors]
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/NanoLLM/nano_llm/chat/main.py", line 32, in
model = NanoLLM.from_pretrained(
File "/opt/NanoLLM/nano_llm/nano_llm.py", line 71, in from_pretrained
model = MLCModel(model_path, **kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 60, in init
quant = MLCModel.quantize(self.model_path, self.config, method=quantization, max_context_len=max_context_len, **kwargs)
File "/opt/NanoLLM/nano_llm/models/mlc.py", line 274, in quantize
subprocess.run(cmd, executable='/bin/bash', shell=True, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m mlc_llm.build --model /data/models/mlc/dist/models/VILA-2.7b --quantization q4f16_ft --target cuda --use-cuda-graph --use-flash-attn-mqa --sep-embed --max-seq-len 256 --artifact-path /data/models/mlc/dist/VILA-2.7b-ctx256 --use-safetensors ' died with <Signals.SIGKILL: 9>.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing to run tutorial in NanoVLM, subprocess build failed #1918

Failing to run tutorial in NanoVLM, subprocess build failed #1918

yyakupoglu commented Dec 9, 2024

Failing to run tutorial in NanoVLM, subprocess build failed #1918

Failing to run tutorial in NanoVLM, subprocess build failed #1918

Comments

yyakupoglu commented Dec 9, 2024