run end_to_end_test_llama.py error #134

SherronBurtint · 2023-05-26T02:10:31Z

Running python3 tools/end_to_end_test_llama.py, an error was prompted, [400] HTTP end point doesn't support models with decoupled transaction policy

hbikki · 2023-06-06T20:40:37Z

tools/end_to_end_test_llama.py

+        ]
+
+        try:
+            result = client.infer(model_name, inputs)


I see the llama model is decoupled , so shouldn't the call be async_stream_infer instead of infer?

I correct it to start_stream(call_back（）) and async_stream_infer() and use old input（HttpInferInput）， but got an error "TypeError: Not a cmessage" from tritonclient/grpc/_utils.py

Add llama triton guide

fix the int8_mode and decoupled mode backend support

when i follow the llame_guide.md to build this lib ,this error occur ```bash /workspace/build/fastertransformer_backend/src/libfastertransformer.cc: In member function 'std::shared_ptr<AbstractTransformerModel> triton::backend::fastertransformer_backend::ModelState::ModelFactory(triton::common::TritonJson::Value&, const string&)': /workspace/build/fastertransformer_backend/src/libfastertransformer.cc:340:98: error: 'int8_mode' was not declared in this scope 340 | ft_model = std::make_shared<LlamaTritonModel<__nv_bfloat16>>(tp, pp, custom_ar, model_dir, int8_mode); | ^~~~~~~~~ [100%] Linking CXX executable ../../../../../bin/multi_gpu_gpt_interactive_example [100%] Built target gptneox_example [100%] Built target multi_gpu_gpt_triton_example [100%] Built target llama_example /workspace/build/fastertransformer_backend/src/libfastertransformer.cc:343:90: error: 'int8_mode' was not declared in this scope 343 | ft_model = std::make_shared<LlamaTritonModel<float>>(tp, pp, custom_ar, model_dir, int8_mode); ``` i think the variable should fix. after i move it , the build success

Update libfastertransformer.cc

void-main added 5 commits April 25, 2023 05:20

add llama to FT backend

d927c81

bugfix

09e3991

add llama config

1d24a32

add ensemble for llama

83058bb

add bf16 for llama

ca007ed

hbikki reviewed Jun 6, 2023

View reviewed changes

l1cacheDell and others added 19 commits June 26, 2023 18:09

edit README

60dcf5d

fix docker build bug

8cbb24c

fix

46c3ac2

add debug in dockerfile

eb4fb09

fix

0302883

all change to llama not 7b

960415b

roll back to Release for faster compiling

ca3a714

change path

562d85a

add llama guide

1f09ab1

edit guide.md

cc8e38e

roll back to void-main repo

5da565e

rollback

47f3a71

Merge pull request #1 from SamuraiBUPT/main

b7ba3dc

Add llama triton guide

int8_mode fix and remove decoupled mode in config

463fa61

Merge branch 'main' of https://github.com/SamuraiBUPT/ft_backend

8cd5172

decoupled mode support

4d3292c

Merge pull request #3 from SamuraiBUPT/dev-llama-backend

9fbf442

fix the int8_mode and decoupled mode backend support

Merge pull request #4 from happened/main

01c793e

Update libfastertransformer.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run end_to_end_test_llama.py error #134

run end_to_end_test_llama.py error #134

SherronBurtint commented May 26, 2023

hbikki Jun 6, 2023

bigmover Jul 28, 2023

run end_to_end_test_llama.py error #134

Are you sure you want to change the base?

run end_to_end_test_llama.py error #134

Conversation

SherronBurtint commented May 26, 2023

hbikki Jun 6, 2023

Choose a reason for hiding this comment

bigmover Jul 28, 2023

Choose a reason for hiding this comment