-
Notifications
You must be signed in to change notification settings - Fork 136
Issues: triton-inference-server/fastertransformer_backend
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Milestones
Assignee
Sort
Issues list
Failed to run on H100 GPU with tensor para=8
#166
opened Sep 15, 2023 by
sfc-gh-zhwang
updated Jul 3, 2024
Memory usage is doubled when loading a fp16 model into bf16
bug
Something isn't working
#164
opened Sep 6, 2023 by
skyser2003
updated Mar 18, 2024
Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification?
#171
opened Oct 19, 2023 by
cabbagetalk
updated Oct 19, 2023
All flan-t5 doesn't work for me
bug
Something isn't working
#114
opened Apr 4, 2023 by
PetroMaslov
updated Sep 27, 2023
No response is received during inference in decoupled mode.
bug
Something isn't working
#169
opened Sep 26, 2023 by
amazingkmy
updated Sep 26, 2023
what is the use of preprocessing & postprossing ? can i start fastertransformer only for bloom model ?
bug
Something isn't working
#168
opened Sep 22, 2023 by
flyingjohn
updated Sep 22, 2023
the docs are not updated with the source code.
#167
opened Sep 22, 2023 by
trinhtuanvubk
updated Sep 22, 2023
How to deploy multiple model in a node with multople GPUs
bug
Something isn't working
#165
opened Sep 14, 2023 by
jjjjohnson
updated Sep 14, 2023
Can i stop execution? (w/ Something isn't working
decoupled mode
)
bug
#162
opened Aug 21, 2023 by
Yeom
updated Sep 12, 2023
Can I enable streaming on an ensemble model?
#155
opened Jul 18, 2023 by
flexwang
updated Aug 31, 2023
Throughput (requests per second / RPS) not increasing when scaling up from 1 GPU to 4 GPUs
#163
opened Aug 22, 2023 by
chunyat
updated Aug 22, 2023
Streaming throwing queue.get() error
bug
Something isn't working
#44
opened Sep 13, 2022 by
rtalaricw
updated Aug 16, 2023
Do I need to specify ARG SM=80 when building the image manually?
#161
opened Aug 15, 2023 by
sfc-gh-zhwang
updated Aug 15, 2023
is_return_log_probs is required for decoupled model?
#160
opened Aug 9, 2023 by
flexwang
updated Aug 9, 2023
How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend?
#139
opened Jun 1, 2023 by
songkq
updated Jul 19, 2023
Multi-instance inference fails in (n-1)/n runs (where n is a number gpus/instances)
#63
opened Oct 26, 2022 by
timofeev1995
updated Jul 10, 2023
Failing to build with triton 23.04
bug
Something isn't working
#150
opened Jun 30, 2023 by
bronzafa
updated Jul 3, 2023
huggingface_bert_convert.py can't convert some key
bug
Something isn't working
#152
opened Jul 3, 2023 by
SeungjaeLim
updated Jul 3, 2023
repo fails to build using Triton Image 23.01
bug
Something isn't working
#93
opened Feb 13, 2023 by
Chris113113
updated Jul 2, 2023
Is deberta supported in the fastertranformer backend?
#148
opened Jun 28, 2023 by
sfc-gh-zhwang
updated Jun 29, 2023
FasterTransformer Backend fails to build using latest version of Triton Server
bug
Something isn't working
#140
opened Jun 2, 2023 by
mshuffett
updated Jun 19, 2023
How to support different models with different tensor_para_size?
#67
opened Nov 4, 2022 by
TopIdiot
updated Jun 15, 2023
Poll failed for model directory 'ensemble': output 'OUTPUT_0' for ensemble 'ensemble' is not written
#144
opened Jun 13, 2023 by
songkq
updated Jun 13, 2023
Why is it needed to set max_batch_size to 1 under interactive mode?
#143
opened Jun 12, 2023 by
zhypku
updated Jun 12, 2023
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.