You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When starting a server for text generation with the torchserve --ncs --start command (with for example the mistralai/Mistral-7B-Instruct-v0.2 model) I obtain the following error stacktrace right before instantiating the PipeStageExecutor stages, but having properly loaded the checkpoint shards.
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - Traceback (most recent call last):
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/internal.py", line 207, in _run_function
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - result = python_udf.func(*python_udf.args, **python_udf.kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/rref_proxy.py", line 11, in _local_invoke
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - return getattr(rref.local_value(), func_name)(*args, **kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/pippy/PipelineDriver.py", line 282, in create_stage_executor
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - mod=mod or Pipe.materialize_stage(mod_name), # type: ignore[attr-defined]
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/pippy/IR.py", line 1105, in materialize_stage
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - submodule.to(device)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - return self._apply(convert)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 853, in _apply
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - self._buffers[key] = fn(buf)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1166, in convert
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - raise NotImplementedError(
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
By looking online for similar instances of this error, it seems it can be raised due to needing more GPU memory, but this doesn't seem to be my case since none of my GPUs reach 50% at any point. But in other not-so-related instances of the same error, it seems it can also appear when trying to copy an empty tensor from one device to the other (e.g. see https://discuss.pytorch.org/t/how-to-convert-a-meta-tensor-to-normal-tensor/172136). So I have tried to follow in a way the advice from the error to use torch.nn.Module.to_empty() instead of torch.nn.Module.to(), and in line 1104 of the pippy/IR.py file I have modified it in the following way:
...
1104 try:
1105 submodule.to(device)
1106 except NotImplementedError as e:
1107 if str(e) == "Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.":
1108 submodule.to_empty(device=device)
1109 else:
1110 raise
1111 except Exception:
...
By doing this, I'm able to bring up the server, but when making an inference request like curl -v "http://localhost:8080/predictions/mistral" -T sample_text.txt I don't get any new output in the generated text, it's just the same as the input. I have made sure that the input_ids are properly formatted, that the tokenizer encodes it in the proper way, that the model path is indeed found, and the output is also decoded properly. Therefore, I'm thinking that maybe this may have something to do with having added the to_empty method.
Curiously, I have also tried with the meta-llama/Llama-2-7b-hf model, and in this case I do get some additional output, but it's just random tokens that don't make any sense, some of them not even being in English.
Error logs
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - Traceback (most recent call last):
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/internal.py", line 207, in _run_function
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - result = python_udf.func(*python_udf.args, **python_udf.kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/rref_proxy.py", line 11, in _local_invoke
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - return getattr(rref.local_value(), func_name)(*args, **kwargs)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/pippy/PipelineDriver.py", line 282, in create_stage_executor
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - mod=mod or Pipe.materialize_stage(mod_name), # type: ignore[attr-defined]
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/pippy/IR.py", line 1105, in materialize_stage
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - submodule.to(device)
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1173, in to
2024-05-31T12:14:51,991 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - return self._apply(convert)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 853, in _apply
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - self._buffers[key] = fn(buf)
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1166, in convert
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - raise NotImplementedError(
2024-05-31T12:14:51,992 [WARN ] W-29500-mistral_1.0-stderr MODEL_LOG - NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
Installation instructions
Using the p3.8xlarge machine from AWS EC2, which has 4 TeslaV100 GPUs with a total of 64GB of GPU RAM, with AMI with Nvidia drivers ami-00b65ebfde51e11fb, and with a 120GB disk.
Torchserve installed directly on machine by cloning the repo, installed needed packages, and downloaded model mistralai/Mistral-7B-Instruct-v0.2 via python ../utils/Download_model.py --model_name mistralai/Mistral-7B-Instruct-v0.2.
Is CUDA available: Yes
CUDA runtime version: 12.1.105
GPU models and configuration:
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB
Nvidia driver version: 535.54.03
cuDNN version: None
From the /serve/examples/large_models/Huggingface_pippy directory I run torch-model-archiver --model-name mistral_model --version 1.0 --handler pippy_handler.py -r requirements.txt --config-file model-config.yaml --archive-format tgz, with the following model-config.yaml (which is similar to the one in the quickstart):
Once the tar.gz file is created I move it to model_store/ and start the server by running torchserve --ncs --start --model-store model_store --models mistral=mistral_model.tar.gz.
Possible Solution
No response
The text was updated successfully, but these errors were encountered:
bjorquera1
changed the title
NotImplementedError: Cannot copy out of meta tensor; no data! + Models not generating output text
NotImplementedError: Cannot copy out of meta tensor; no data! + Models not generating output text
Jun 5, 2024
Fully Sharded Data Parallel (FSDP) is a technique that allows for efficient training of large models by distributing the model's parameters, gradients, and optimizer states across multiple devices, significantly reducing memory usage and enabling the training of larger models or larger batch sizes compared to traditional methods like Distributed Data Parallel (DDP)
🐛 Describe the bug
When starting a server for text generation with the
torchserve --ncs --start
command (with for example the mistralai/Mistral-7B-Instruct-v0.2 model) I obtain the following error stacktrace right before instantiating the PipeStageExecutor stages, but having properly loaded the checkpoint shards.By looking online for similar instances of this error, it seems it can be raised due to needing more GPU memory, but this doesn't seem to be my case since none of my GPUs reach 50% at any point. But in other not-so-related instances of the same error, it seems it can also appear when trying to copy an empty tensor from one device to the other (e.g. see https://discuss.pytorch.org/t/how-to-convert-a-meta-tensor-to-normal-tensor/172136). So I have tried to follow in a way the advice from the error to use
torch.nn.Module.to_empty()
instead oftorch.nn.Module.to()
, and in line 1104 of thepippy/IR.py
file I have modified it in the following way:By doing this, I'm able to bring up the server, but when making an inference request like
curl -v "http://localhost:8080/predictions/mistral" -T sample_text.txt
I don't get any new output in the generated text, it's just the same as the input. I have made sure that the input_ids are properly formatted, that the tokenizer encodes it in the proper way, that the model path is indeed found, and the output is also decoded properly. Therefore, I'm thinking that maybe this may have something to do with having added theto_empty
method.Curiously, I have also tried with the meta-llama/Llama-2-7b-hf model, and in this case I do get some additional output, but it's just random tokens that don't make any sense, some of them not even being in English.
Error logs
Installation instructions
Using the p3.8xlarge machine from AWS EC2, which has 4 TeslaV100 GPUs with a total of 64GB of GPU RAM, with AMI with Nvidia drivers ami-00b65ebfde51e11fb, and with a 120GB disk.
Torchserve installed directly on machine by cloning the repo, installed needed packages, and downloaded model
mistralai/Mistral-7B-Instruct-v0.2
viapython ../utils/Download_model.py --model_name mistralai/Mistral-7B-Instruct-v0.2
.Model Packaging
Standard packaging from quickstart in https://github.com/pytorch/serve/blob/master/examples/large_models/Huggingface_pippy/Readme.md
config.properties
No response
Versions
Environment headers
Torchserve branch:
torchserve==0.11.0
torch-model-archiver==0.11.0
Python version: 3.10 (64-bit runtime)
Python executable: /opt/conda/bin/python
Versions of relevant python libraries:
captum==0.6.0
intel-extension-for-pytorch==2.3.0
numpy==1.24.3
nvgpu==0.10.0
pillow==10.3.0
psutil==5.9.8
pygit2==1.13.3
pylint==3.0.3
pytest==7.3.1
pytest-cov==4.1.0
pytest-mock==3.14.0
pytest-timeout==2.3.1
requests==2.32.0
requests-toolbelt==1.0.0
torch==2.3.0+cu121
torch-model-archiver==0.11.0
torch-workflow-archiver==0.2.13
torchaudio==2.3.0+cu121
torchpippy==0.1.1
torchserve==0.11.0
torchtext==0.18.0
torchvision==0.18.0+cu121
transformers==4.41.1
wheel==0.42.0
torch==2.3.0+cu121
torchtext==0.18.0
torchvision==0.18.0+cu121
torchaudio==2.3.0+cu121
Java Version:
OS: Ubuntu 20.04.6 LTS
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: N/A
CMake version: version 3.16.3
Is CUDA available: Yes
CUDA runtime version: 12.1.105
GPU models and configuration:
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB
Nvidia driver version: 535.54.03
cuDNN version: None
Environment:
library_path (LD_/DYLD_): /opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/opt/aws-ofi-nccl/lib:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib
Repro instructions
From the
/serve/examples/large_models/Huggingface_pippy
directory I runtorch-model-archiver --model-name mistral_model --version 1.0 --handler pippy_handler.py -r requirements.txt --config-file model-config.yaml --archive-format tgz
, with the followingmodel-config.yaml
(which is similar to the one in the quickstart):Once the tar.gz file is created I move it to
model_store/
and start the server by runningtorchserve --ncs --start --model-store model_store --models mistral=mistral_model.tar.gz
.Possible Solution
No response
The text was updated successfully, but these errors were encountered: