-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Torchserve docker fails to run on existing mar file #1010
Comments
Its not clear from the logs on what your custom handler is trying to do here. Addding the handler code would help out to understand more details on whats going on.
Also, Were you able to run your model in a standalone TS ? |
Handler file(text_handler.py):
Dockerfile:
|
|
@dhanainme Just to clarify, all these issues only occur when trying to run the torchserve inside a docker image, there are no issues when running in a standalone ubuntu system. The docker image is necessary to deploy this as a microservice, but the torchserve just fails without emiting any failure logs, so its hard to debug what's going wrong |
@dhanainme I have tried everything possible to run this mar file inside a docker container using torchserve, every single time it fails with absolutenly no indication/logs of what's the underlying issue. This is the mar file: https://drive.google.com/file/d/18tiD5gLvbRvq6P9kHjqlSNBGO55TbXVs/view?usp=sharing Can you try downloading and running it inside a docker container and provide me with a Dockerfile that I can deploy it to a kubernetes cluster? |
@ayushch3 It seems like an issue with loading the model file. Will try and reproduce this but in the meantime do you mind getting a shell into the container and loading the model manually |
@maaquib Running the mar file on a standalone ubuntu box correctly, but the same mar file when mounted to torchserve docker image fails to load. When using the shell to get into the container, it fails to load the model with the same logs as attached above. I am certain that there is no issue with the mar file, since it wouldn't have run locally. However, the issue is how the torchserve runs inside a docker container, since neither mounting the image or building the image from scratch using cuda works
|
@maaquib The underlying issue is the docker can't handle any custom handler, I literally wrote the entire docker image from scratch, it still failed with the following logs
I still don't understand why is it complaining about a custom handler when I am running torch-model-archiver inside the docker container |
I have a similar issue. In my setup, the goal is to invoke a model without any pre/post processing. I browsed the source and found that base_handler fits the need. so I did the following
I started the sever and hist the error
|
@maaquib Even mounting an existing mar file which runs on my ubuntu system to a docker image fails to execute on the image provided by pytorchserve
I see the following errors when running the docker image:
Is there an ETA or workaround for this issue to deploy a torch serve model in production? I have been trying to find multiple ways to make it work in a docker image, but nothing seems to work. Surprisingly, the same mar file works as expected on the ubuntu box |
@ayushch3 how is your progress on this issue? I get similar no module error even when I am running example case for image classification or object detection, also in a nvidia image container. And do you have any idea about the root cause? |
@ayushch3 can you run "python ts_scripts/print_env_info.py" at your local and docker container and compare the dependent packages differences? |
@ayushch3 Can you check if there was any other import error above the torch_handler import error in
This code will see if it can load a custom handler, before attempting to load default handlers. |
@ayushch3 @kqhuynguyen pls add install_py_dep_per_model=true in config.properties if your model needs install package. And then copy or attach the config.properties to your docker container. |
I'm experiencing the same issue as this thread. I have added Then run my image with following command to ensure that I have all the volume mount working correctly. However, I'm still experiencing handler module not found error...
Can you illuminate on what else I can try? Also is there examples on @lxning your linked Allow model specific custom python packages? I would like to try and have that "seamless model serving" experience.... |
Any update on this? I'm still getting the same error. |
Context
I ran the torch model archiver on a different machine to create a mar with a custom handler for transformer model using this command:
It took about 20 mins and the mar file was created correctly. I was able to locally verify torch serve indeed works on that system using the following command:
Expected Behavior
In order to run this on kubernetes, I took the pre-existing pytorch/torchserve:latest-gpu image from docker hub, so that I can run in a different environment by leveraging the mar file directly using this command:
Current Behavior
The execution fails when running that docker container with the following error logs:
Steps to Reproduce
torchserve --start --model-store model_store --models my_tc=TranslationClassifier.mar
in a different environment, it should run out of the box and not failure to recognize the custom handler...
The text was updated successfully, but these errors were encountered: