NotImplementedError: Cannot copy out of meta tensor; no data! + Models not generating output text #3167

bjorquera1 · 2024-05-31T13:00:30Z

🐛 Describe the bug

When starting a server for text generation with the torchserve --ncs --start command (with for example the mistralai/Mistral-7B-Instruct-v0.2 model) I obtain the following error stacktrace right before instantiating the PipeStageExecutor stages, but having properly loaded the checkpoint shards.

2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - Traceback (most recent call last):
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/internal.py", line 207, in _run_function
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     result = python_udf.func(*python_udf.args, **python_udf.kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/rref_proxy.py", line 11, in _local_invoke
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     return getattr(rref.local_value(), func_name)(*args, **kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/pippy/PipelineDriver.py", line 282, in create_stage_executor
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     mod=mod or Pipe.materialize_stage(mod_name),  # type: ignore[attr-defined]
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/pippy/IR.py", line 1105, in materialize_stage
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     submodule.to(device)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     return self._apply(convert)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 853, in _apply
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     self._buffers[key] = fn(buf)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1166, in convert
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     raise NotImplementedError(
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

By looking online for similar instances of this error, it seems it can be raised due to needing more GPU memory, but this doesn't seem to be my case since none of my GPUs reach 50% at any point. But in other not-so-related instances of the same error, it seems it can also appear when trying to copy an empty tensor from one device to the other (e.g. see https://discuss.pytorch.org/t/how-to-convert-a-meta-tensor-to-normal-tensor/172136). So I have tried to follow in a way the advice from the error to use torch.nn.Module.to_empty() instead of torch.nn.Module.to(), and in line 1104 of the pippy/IR.py file I have modified it in the following way:

...
1104             try:
1105                 submodule.to(device)
1106             except NotImplementedError as e:
1107                 if str(e) == "Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.":
1108                     submodule.to_empty(device=device)
1109                 else:
1110                     raise
1111             except Exception:
...

By doing this, I'm able to bring up the server, but when making an inference request like curl -v "http://localhost:8080/predictions/mistral" -T sample_text.txt I don't get any new output in the generated text, it's just the same as the input. I have made sure that the input_ids are properly formatted, that the tokenizer encodes it in the proper way, that the model path is indeed found, and the output is also decoded properly. Therefore, I'm thinking that maybe this may have something to do with having added the to_empty method.
Curiously, I have also tried with the meta-llama/Llama-2-7b-hf model, and in this case I do get some additional output, but it's just random tokens that don't make any sense, some of them not even being in English.

Error logs

2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - Traceback (most recent call last):
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/internal.py", line 207, in _run_function
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     result = python_udf.func(*python_udf.args, **python_udf.kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/rref_proxy.py", line 11, in _local_invoke
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     return getattr(rref.local_value(), func_name)(*args, **kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/pippy/PipelineDriver.py", line 282, in create_stage_executor
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     mod=mod or Pipe.materialize_stage(mod_name),  # type: ignore[attr-defined]
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/pippy/IR.py", line 1105, in materialize_stage
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     submodule.to(device)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     return self._apply(convert)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 853, in _apply
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     self._buffers[key] = fn(buf)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1166, in convert
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG -     raise NotImplementedError(
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Installation instructions

Using the p3.8xlarge machine from AWS EC2, which has 4 TeslaV100 GPUs with a total of 64GB of GPU RAM, with AMI with Nvidia drivers ami-00b65ebfde51e11fb, and with a 120GB disk.
Torchserve installed directly on machine by cloning the repo, installed needed packages, and downloaded model mistralai/Mistral-7B-Instruct-v0.2 via python ../utils/Download_model.py --model_name mistralai/Mistral-7B-Instruct-v0.2.

Model Packaging

Standard packaging from quickstart in https://github.com/pytorch/serve/blob/master/examples/large_models/Huggingface_pippy/Readme.md

config.properties

No response

Versions

Environment headers

Torchserve branch:

torchserve==0.11.0
torch-model-archiver==0.11.0

Python version: 3.10 (64-bit runtime)
Python executable: /opt/conda/bin/python

Versions of relevant python libraries:
captum==0.6.0
intel-extension-for-pytorch==2.3.0
numpy==1.24.3
nvgpu==0.10.0
pillow==10.3.0
psutil==5.9.8
pygit2==1.13.3
pylint==3.0.3
pytest==7.3.1
pytest-cov==4.1.0
pytest-mock==3.14.0
pytest-timeout==2.3.1
requests==2.32.0
requests-toolbelt==1.0.0
torch==2.3.0+cu121
torch-model-archiver==0.11.0
torch-workflow-archiver==0.2.13
torchaudio==2.3.0+cu121
torchpippy==0.1.1
torchserve==0.11.0
torchtext==0.18.0
torchvision==0.18.0+cu121
transformers==4.41.1
wheel==0.42.0
torch==2.3.0+cu121
torchtext==0.18.0
torchvision==0.18.0+cu121
torchaudio==2.3.0+cu121

Java Version:

OS: Ubuntu 20.04.6 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: N/A
CMake version: version 3.16.3

Is CUDA available: Yes
CUDA runtime version: 12.1.105
GPU models and configuration:
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB
Nvidia driver version: 535.54.03
cuDNN version: None

Environment:
library_path (LD_/DYLD_): /opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib

Repro instructions

From the /serve/examples/large_models/Huggingface_pippy directory I run torch-model-archiver --model-name mistral_model --version 1.0 --handler pippy_handler.py -r requirements.txt --config-file model-config.yaml --archive-format tgz, with the following model-config.yaml (which is similar to the one in the quickstart):

#frontend settings
minWorkers: 1
maxWorkers: 1
maxBatchDelay: 200
responseTimeout: 300
parallelType: "pp"
deviceType: "gpu"
torchrun:
    nproc-per-node: 4

#backend settings
pippy:
    rpc_timeout: 1800
    model_type: "HF"
    chunks: 1
    input_names: ["input_ids"]
    num_worker_threads: 256

handler:
    model_path: "/home/ubuntu/serve/examples/large_models/Huggingface_pippy/model/models--mistralai--Mistral-7B-Instruct-v0.2/snapshots/41b61a33a2483885c981aa79e0df6b32407ed873"
    index_filename: 'pytorch_model.bin.index.json'
    max_length: 80
    max_new_tokens: 100
    manual_seed: 41
    dtype: fp16

Once the tar.gz file is created I move it to model_store/ and start the server by running torchserve --ncs --start --model-store model_store --models mistral=mistral_model.tar.gz.

Possible Solution

No response

The text was updated successfully, but these errors were encountered:

steveepreston · 2025-01-04T22:50:08Z

I faced this exact error in training. in there, i bypassed this error via FSDP

BlackTea-c · 2025-01-16T11:17:09Z

I faced this exact error in training. in there, i bypassed this error via FSDP

what's FSDP? can u tell me the detail? plz

steveepreston · 2025-01-16T18:58:51Z

@BlackTea-c

Fully Sharded Data Parallel (FSDP) is a technique that allows for efficient training of large models by distributing the model's parameters, gradients, and optimizer states across multiple devices, significantly reducing memory usage and enabling the training of larger models or larger batch sizes compared to traditional methods like Distributed Data Parallel (DDP)

Read blog post in pytorch website here

For a code example you can see this notebook

bjorquera1 changed the title ~~NotImplementedError: Cannot copy out of meta tensor; no data! + Models not generating output text~~ NotImplementedError: Cannot copy out of meta tensor; no data! + Models not generating output text Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NotImplementedError: Cannot copy out of meta tensor; no data! + Models not generating output text #3167

NotImplementedError: Cannot copy out of meta tensor; no data! + Models not generating output text #3167

bjorquera1 commented May 31, 2024 •

edited

Loading

steveepreston commented Jan 4, 2025

BlackTea-c commented Jan 16, 2025

steveepreston commented Jan 16, 2025

NotImplementedError: Cannot copy out of meta tensor; no data! + Models not generating output text #3167

NotImplementedError: Cannot copy out of meta tensor; no data! + Models not generating output text #3167

Comments

bjorquera1 commented May 31, 2024 • edited Loading

🐛 Describe the bug

Error logs

Installation instructions

Model Packaging

config.properties

Versions

Environment headers

Repro instructions

Possible Solution

steveepreston commented Jan 4, 2025

BlackTea-c commented Jan 16, 2025

steveepreston commented Jan 16, 2025

bjorquera1 commented May 31, 2024 •

edited

Loading