We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am unable to get the llama example to work with tensor parallelism.
I have 2x L4 machines NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0
When running the script https://github.com/huggingface/optimum-nvidia/blob/main/examples/text-generation/llama.py
python text-generation/llama.py /model/torch-weights /model/local-compiled --world-size=2 --tensor-parallelism=2 --dtype=bfloat16 --gpus-per-node=2 --max-batch-size=1 --max-prompt-length=3000 --max-new-tokens=1096 --max-beam-width=1
The engine files build and save successfully, but when attempting to load them in the TensorRTForCausalLM class I get the following error
Error occurs on line 107 https://github.com/huggingface/optimum-nvidia/blob/main/examples/text-generation/llama.py#L107
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: mpiSize == tp * pp (/opt/optimum-nvidia/third-party/tensorrt-llm/cpp/tensorrt_llm/runtime/worldConfig.cpp:89) 1 0x7f6fe2659212 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x7c212) [0x7f6fe2659212] 2 0x7f6fe2675df9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x98df9) [0x7f6fe2675df9] 3 0x7f6fe275d847 tensorrt_llm::runtime::WorldConfig::mpi(int, std::optional, std::optional, std::optional<std::vector<int, std::allocator > >) + 103 4 0x7f6fe26a2057 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xc5057) [0x7f6fe26a2057] 5 0x7f6fe2691bc7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb4bc7) [0x7f6fe2691bc7] 6 0x559ca74b2e0e python(+0x15fe0e) [0x559ca74b2e0e] 7 0x559ca74a95eb _PyObject_MakeTpCall + 603 8 0x559ca74a21f1 _PyEval_EvalFrameDefault + 27297 9 0x559ca758ce56 python(+0x239e56) [0x559ca758ce56] 10 0x559ca758ccf6 PyEval_EvalCode + 134 11 0x559ca75b77d8 python(+0x2647d8) [0x559ca75b77d8] 12 0x559ca75b10bb python(+0x25e0bb) [0x559ca75b10bb] 13 0x559ca740a4d0 python(+0xb74d0) [0x559ca740a4d0] 14 0x559ca740a012 _PyRun_InteractiveLoopObject + 195 15 0x559ca75b6678 _PyRun_AnyFileObject + 104 16 0x559ca73f45c8 PyRun_AnyFileExFlags + 79 17 0x559ca73e96e8 python(+0x966e8) [0x559ca73e96e8] 18 0x559ca757fcad Py_BytesMain + 45 19 0x7f7134164d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f7134164d90] 20 0x7f7134164e40 __libc_start_main + 128 21 0x559ca757fba5 _start + 37
The text was updated successfully, but these errors were encountered:
Further investigation on the function which is throwing the error from the tensorrt_llm.bindings
The same error is generated with the following script from within the huggingface/optimum-nvidia:latest docker container (image id: d08d1226a2ab)
huggingface/optimum-nvidia:latest
import tensorrt_llm.bindings as ctrrt gpus_per_node = 2 tensor_parallelism = 2 pipeline_parallelism = 1 ctrrt.WorldConfig.mpi( gpus_per_node, tensor_parallelism, pipeline_parallelism, )
Sorry, something went wrong.
mfuntowicz
No branches or pull requests
I am unable to get the llama example to work with tensor parallelism.
I have 2x L4 machines
NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0
When running the script
https://github.com/huggingface/optimum-nvidia/blob/main/examples/text-generation/llama.py
python text-generation/llama.py /model/torch-weights /model/local-compiled --world-size=2 --tensor-parallelism=2 --dtype=bfloat16 --gpus-per-node=2 --max-batch-size=1 --max-prompt-length=3000 --max-new-tokens=1096 --max-beam-width=1
The engine files build and save successfully, but when attempting to load them in the TensorRTForCausalLM class I get the following error
Error occurs on line 107
https://github.com/huggingface/optimum-nvidia/blob/main/examples/text-generation/llama.py#L107
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: mpiSize == tp * pp (/opt/optimum-nvidia/third-party/tensorrt-llm/cpp/tensorrt_llm/runtime/worldConfig.cpp:89)
1 0x7f6fe2659212 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x7c212) [0x7f6fe2659212]
2 0x7f6fe2675df9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x98df9) [0x7f6fe2675df9]
3 0x7f6fe275d847 tensorrt_llm::runtime::WorldConfig::mpi(int, std::optional, std::optional, std::optional<std::vector<int, std::allocator > >) + 103
4 0x7f6fe26a2057 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xc5057) [0x7f6fe26a2057]
5 0x7f6fe2691bc7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb4bc7) [0x7f6fe2691bc7]
6 0x559ca74b2e0e python(+0x15fe0e) [0x559ca74b2e0e]
7 0x559ca74a95eb _PyObject_MakeTpCall + 603
8 0x559ca74a21f1 _PyEval_EvalFrameDefault + 27297
9 0x559ca758ce56 python(+0x239e56) [0x559ca758ce56]
10 0x559ca758ccf6 PyEval_EvalCode + 134
11 0x559ca75b77d8 python(+0x2647d8) [0x559ca75b77d8]
12 0x559ca75b10bb python(+0x25e0bb) [0x559ca75b10bb]
13 0x559ca740a4d0 python(+0xb74d0) [0x559ca740a4d0]
14 0x559ca740a012 _PyRun_InteractiveLoopObject + 195
15 0x559ca75b6678 _PyRun_AnyFileObject + 104
16 0x559ca73f45c8 PyRun_AnyFileExFlags + 79
17 0x559ca73e96e8 python(+0x966e8) [0x559ca73e96e8]
18 0x559ca757fcad Py_BytesMain + 45
19 0x7f7134164d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f7134164d90]
20 0x7f7134164e40 __libc_start_main + 128
21 0x559ca757fba5 _start + 37
The text was updated successfully, but these errors were encountered: