How to support different models with different tensor_para_size? #67

TopIdiot · 2022-11-04T06:17:11Z

I have 4 GPUs and 3 models called small, medium and large. I want to deploy small model on GPU 0, medium model on GPU 1, and large model on GPU 2 and GPU3 with tensor_para_size=2 due to large model is too huge that cannot be placed on single GPU.

However, the instance_group can only be KIND_CPU, so I can do nothing about it.

Is there any way to handler this?

TopIdiot · 2022-11-04T06:21:22Z

Besides, I tried 'mpirun -n 1 /opt/tritonserver/bin/tritonserver' three times with different CUDA_VISIBLE_DEVICES, server port and model-repository . However, that doesn't work, the processes was blocked when loading models.

byshiue · 2022-11-04T06:23:57Z

You should launch three tritonserver, first one use CUDA_VISIBLE_DEVICES=0, second one uses CUDA_VISIBLE_DEVICES=1, third one uses CUDA_VISIBLE_DEVICES=2,3. They may need to use different configuration and set with different names.

TopIdiot · 2022-11-04T07:24:46Z

You should launch three tritonserver, first one use CUDA_VISIBLE_DEVICES=0, second one uses CUDA_VISIBLE_DEVICES=1, third one uses CUDA_VISIBLE_DEVICES=2,3. They may need to use different configuration and set with different names.

@byshiue I did so, but it still cannot work. I use supervisord to run tritonserver. CUDA_VISIBLE_DEVICES is set in program environment section

Here is the meidum model output:
At first time:

After the process break down, the supervisord start it again:

The sencond time:

And then it blocked.

However, the nvidia-smi shows the model is ready

byshiue · 2022-11-04T07:43:42Z

I cannot see the results of first time. Can you post again?

TopIdiot · 2022-11-04T07:45:34Z

@byshiue I am sorry, I place "the sencond time" in wrong area. Now it is ok

TopIdiot · 2022-11-04T07:47:34Z

@byshiue From the log, It seems like only one process can load model, and others would be block. But the one which can load all models cannot work,too

byshiue · 2022-11-04T07:48:00Z

The error is PTX compiled with an unsupported toolchain. You don't load any model successfully.
What's docker version do you use?

TopIdiot · 2022-11-04T07:48:58Z

@byshiue Docker version 20.10.21

TopIdiot · 2022-11-04T07:50:51Z

@byshiue But when there is only one tritonserver, it works fine.

byshiue · 2022-11-04T07:53:16Z

Can you post your result one by one? What happen when you launch first one, and what happen when the second one?

From the graph you post, the first launch is fail.

And what the docker image you use?

TopIdiot · 2022-11-04T08:02:16Z

Here is my supervisord:

In production, I use 3 models, medium(on gpu 2), large(on gpu 3) and xl(on gpu 0, 1)
the meidum log:
First time:

Second time:

the large model log:

the xl model log:

byshiue · 2022-11-04T08:04:19Z

Sorry, can you refine your format? It is too chaos to read now.

TopIdiot · 2022-11-04T08:06:59Z

@byshiue sorry, I reformat it

byshiue · 2022-11-04T08:10:05Z

What's the meaning of "second time" for medium log? Do you re-launch again and first time crash, but second time works? Do you check that you have clean all old processes?

What happen when you only launch one sever each time for these three models?

TopIdiot · 2022-11-04T08:14:13Z

@byshiue Yes. After the first time the medium break down, supervisord restart it automaticly, and the second time seems good at beginning, but blocked then.

I also tried this shell:

The processes are also blocked.

byshiue · 2022-11-04T08:20:02Z

Can you try to start only one model each time for these three cases?

TopIdiot · 2022-11-04T08:21:47Z

@byshiue Did you mean that I start the three models one by one ?

byshiue · 2022-11-04T08:22:00Z

Yes.

TopIdiot · 2022-11-04T08:35:34Z

@byshiue The first model is working fine:

but when I run the second one, it blocked:

And there is a Z process. I don't know if it dose matter.

byshiue · 2022-11-04T08:49:36Z

I mean that only launch one process each time. When you launch the second server, you should kill the first one.

TopIdiot · 2022-11-04T08:58:38Z

@byshiue At that condition, all models work fine.

TopIdiot · 2022-11-04T09:05:29Z

@byshiue It seems that if /opt/tritonserver/backends/python/triton_python_backend_stub is still running, the new tritonserver must blocked. If I killed it, the new tritonserver can work fine.

byshiue · 2022-11-04T09:07:59Z

Can you try adding the verbose like tritonserver --log-verbose 1 --model-repository=<your_model>?

TopIdiot · 2022-11-04T09:17:04Z

@byshiue

the second model blocked at this

byshiue · 2022-11-04T09:29:06Z

Can you try to only launch the fastertransformer, but exclude the pre/post processing?

TopIdiot · 2022-11-04T09:37:14Z

@byshiue Now all the processes started, but I don't know why. The pre/post processing code is based on https://github.com/triton-inference-server/fastertransformer_backend/tree/main/all_models/gptneox, the only thing I did is change the tokenizer to my own

byshiue · 2022-11-04T09:37:51Z

Can you launch the server with original pre/post processing?

TopIdiot · 2022-11-04T09:53:13Z

@byshiue Yes, it works... but I don't know why. My only change is to use huggingface transfomers.T5Tokenizer to replace original tokenizer

calico-niko · 2023-06-15T11:37:05Z

@TopIdiot @byshiue Hi, there. I have same problem when I use multiple triton server to loading different models with different GPUs. Any update of this issue? Tokenizer is huggingface's tokenizer (AutoTokenizer), model is bloom. My situation is all models are loaded to GPU, but when I send gprc request, the triton and log are just stuck and show nothing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to support different models with different tensor_para_size? #67

How to support different models with different tensor_para_size? #67

TopIdiot commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022

TopIdiot commented Nov 4, 2022

byshiue commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022

byshiue commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 •

edited

Loading

calico-niko commented Jun 15, 2023

How to support different models with different tensor_para_size? #67

How to support different models with different tensor_para_size? #67

Comments

TopIdiot commented Nov 4, 2022 • edited Loading

TopIdiot commented Nov 4, 2022 • edited Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 • edited Loading

byshiue commented Nov 4, 2022 • edited Loading

TopIdiot commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 • edited Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022

TopIdiot commented Nov 4, 2022

byshiue commented Nov 4, 2022 • edited Loading

TopIdiot commented Nov 4, 2022 • edited Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022

byshiue commented Nov 4, 2022 • edited Loading

TopIdiot commented Nov 4, 2022 • edited Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 • edited Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 • edited Loading

byshiue commented Nov 4, 2022 • edited Loading

TopIdiot commented Nov 4, 2022 • edited Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 • edited Loading

byshiue commented Nov 4, 2022

TopIdiot commented Nov 4, 2022 • edited Loading

calico-niko commented Jun 15, 2023

TopIdiot commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

byshiue commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading

TopIdiot commented Nov 4, 2022 •

edited

Loading