Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU batch decoding + Online request queueing machanism #30

Open
greed2411 opened this issue Oct 22, 2021 · 1 comment
Open

GPU batch decoding + Online request queueing machanism #30

greed2411 opened this issue Oct 22, 2021 · 1 comment
Labels
winter-of-code gdsc's woc

Comments

@greed2411
Copy link
Member

possibly along with a request queueing mechanism like ServiceStreamer for online

@greed2411 greed2411 added the winter-of-code gdsc's woc label Oct 22, 2021
@pskrunner14 pskrunner14 changed the title GPU online/offline batch decoding GPU batch decoding + Online request queueing machanism Nov 17, 2021
@pskrunner14
Copy link
Contributor

pskrunner14 commented Nov 17, 2021

Task 1

Write an interface and implement GPU batch decoding for Kaldi ASR models in the kaldi-serve core C++ library.

The current partial version (gpu-decoder branch) is buggy (stale issue here), which you may use as a starting point or write one from scratch, it's upto you. The main idea here is to be able to pass a custom async callback to the batch decoding pipeline that accepts the final result once the GPU compute task is complete.

Relevant links:

  1. Batched Decoding binary
  2. Batched Threaded CUDA Pipeline - Source

Task 2

Implement an online request queueing mechanism similar to that of ServiceStreamer that utilizes the GPU Batch Decoding interface (Task 1) to reduce latency in the kaldi-serve gRPC server application during higher loads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
winter-of-code gdsc's woc
Projects
None yet
Development

No branches or pull requests

2 participants