triton-inference-server / fastertransformer_backend Public

Notifications
Fork 136
Star 410

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/fastertransformer_backend

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

68 Open 47 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Failed to run on H100 GPU with tensor para=8

#166 opened Sep 15, 2023 by sfc-gh-zhwang updated Jul 3, 2024

Memory usage is doubled when loading a fp16 model into bf16 bug

Something isn't working

#164 opened Sep 6, 2023 by skyser2003 updated Mar 18, 2024

tritonserver version

#173 opened Nov 2, 2023 by double-vin updated Nov 2, 2023

Whether fastertransformer supports gpt-2 classification model, such as GPT2ForSequenceClassification？

#171 opened Oct 19, 2023 by cabbagetalk updated Oct 19, 2023

All flan-t5 doesn't work for me bug

Something isn't working

#114 opened Apr 4, 2023 by PetroMaslov updated Sep 27, 2023

No response is received during inference in decoupled mode. bug

Something isn't working

#169 opened Sep 26, 2023 by amazingkmy updated Sep 26, 2023

what is the use of preprocessing & postprossing ? can i start fastertransformer only for bloom model ? bug

Something isn't working

#168 opened Sep 22, 2023 by flyingjohn updated Sep 22, 2023

the docs are not updated with the source code.

#167 opened Sep 22, 2023 by trinhtuanvubk updated Sep 22, 2023

How to deploy multiple model in a node with multople GPUs bug

Something isn't working

#165 opened Sep 14, 2023 by jjjjohnson updated Sep 14, 2023

Can i stop execution? (w/ decoupled mode) bug

Something isn't working

#162 opened Aug 21, 2023 by Yeom updated Sep 12, 2023

Can I enable streaming on an ensemble model?

#155 opened Jul 18, 2023 by flexwang updated Aug 31, 2023

Throughput (requests per second / RPS) not increasing when scaling up from 1 GPU to 4 GPUs

#163 opened Aug 22, 2023 by chunyat updated Aug 22, 2023

Streaming throwing queue.get() error bug

Something isn't working

#44 opened Sep 13, 2022 by rtalaricw updated Aug 16, 2023

Do I need to specify ARG SM=80 when building the image manually?

#161 opened Aug 15, 2023 by sfc-gh-zhwang updated Aug 15, 2023

is_return_log_probs is required for decoupled model?

#160 opened Aug 9, 2023 by flexwang updated Aug 9, 2023

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend?

#139 opened Jun 1, 2023 by songkq updated Jul 19, 2023

Multi-instance inference fails in (n-1)/n runs (where n is a number gpus/instances)

#63 opened Oct 26, 2022 by timofeev1995 updated Jul 10, 2023

Failing to build with triton 23.04 bug

Something isn't working

#150 opened Jun 30, 2023 by bronzafa updated Jul 3, 2023

huggingface_bert_convert.py can't convert some key bug

Something isn't working

#152 opened Jul 3, 2023 by SeungjaeLim updated Jul 3, 2023

repo fails to build using Triton Image 23.01 bug

Something isn't working

#93 opened Feb 13, 2023 by Chris113113 updated Jul 2, 2023

Is deberta supported in the fastertranformer backend?

#148 opened Jun 28, 2023 by sfc-gh-zhwang updated Jun 29, 2023

FasterTransformer Backend fails to build using latest version of Triton Server bug

Something isn't working

#140 opened Jun 2, 2023 by mshuffett updated Jun 19, 2023

How to support different models with different tensor_para_size?

#67 opened Nov 4, 2022 by TopIdiot updated Jun 15, 2023

Poll failed for model directory 'ensemble': output 'OUTPUT_0' for ensemble 'ensemble' is not written

#144 opened Jun 13, 2023 by songkq updated Jun 13, 2023

Why is it needed to set max_batch_size to 1 under interactive mode?

#143 opened Jun 12, 2023 by zhypku updated Jun 12, 2023

Previous 1 2 3 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly