Error when Running LLAMA with tensor parallelism = 2 #68

TheCodeWrangler · 2024-01-23T17:31:49Z

I am unable to get the llama example to work with tensor parallelism.

I have 2x L4 machines
NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0

When running the script
https://github.com/huggingface/optimum-nvidia/blob/main/examples/text-generation/llama.py

python text-generation/llama.py /model/torch-weights /model/local-compiled --world-size=2 --tensor-parallelism=2 --dtype=bfloat16 --gpus-per-node=2 --max-batch-size=1 --max-prompt-length=3000 --max-new-tokens=1096 --max-beam-width=1

The engine files build and save successfully, but when attempting to load them in the TensorRTForCausalLM class I get the following error

Error occurs on line 107
https://github.com/huggingface/optimum-nvidia/blob/main/examples/text-generation/llama.py#L107

RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: mpiSize == tp * pp (/opt/optimum-nvidia/third-party/tensorrt-llm/cpp/tensorrt_llm/runtime/worldConfig.cpp:89)
1 0x7f6fe2659212 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x7c212) [0x7f6fe2659212]
2 0x7f6fe2675df9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x98df9) [0x7f6fe2675df9]
3 0x7f6fe275d847 tensorrt_llm::runtime::WorldConfig::mpi(int, std::optional, std::optional, std::optional<std::vector<int, std::allocator > >) + 103
4 0x7f6fe26a2057 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xc5057) [0x7f6fe26a2057]
5 0x7f6fe2691bc7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb4bc7) [0x7f6fe2691bc7]
6 0x559ca74b2e0e python(+0x15fe0e) [0x559ca74b2e0e]
7 0x559ca74a95eb _PyObject_MakeTpCall + 603
8 0x559ca74a21f1 _PyEval_EvalFrameDefault + 27297
9 0x559ca758ce56 python(+0x239e56) [0x559ca758ce56]
10 0x559ca758ccf6 PyEval_EvalCode + 134
11 0x559ca75b77d8 python(+0x2647d8) [0x559ca75b77d8]
12 0x559ca75b10bb python(+0x25e0bb) [0x559ca75b10bb]
13 0x559ca740a4d0 python(+0xb74d0) [0x559ca740a4d0]
14 0x559ca740a012 _PyRun_InteractiveLoopObject + 195
15 0x559ca75b6678 _PyRun_AnyFileObject + 104
16 0x559ca73f45c8 PyRun_AnyFileExFlags + 79
17 0x559ca73e96e8 python(+0x966e8) [0x559ca73e96e8]
18 0x559ca757fcad Py_BytesMain + 45
19 0x7f7134164d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f7134164d90]
20 0x7f7134164e40 __libc_start_main + 128
21 0x559ca757fba5 _start + 37

The text was updated successfully, but these errors were encountered:

TheCodeWrangler · 2024-01-24T13:51:50Z

Further investigation on the function which is throwing the error from the tensorrt_llm.bindings

The same error is generated with the following script from within the huggingface/optimum-nvidia:latest docker container (image id: d08d1226a2ab)

import tensorrt_llm.bindings as ctrrt

gpus_per_node = 2
tensor_parallelism = 2
pipeline_parallelism = 1

ctrrt.WorldConfig.mpi(
            gpus_per_node,
            tensor_parallelism,
            pipeline_parallelism,
        )

mfuntowicz added the bug Something isn't working label Feb 8, 2024

mfuntowicz self-assigned this Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when Running LLAMA with tensor parallelism = 2 #68

Error when Running LLAMA with tensor parallelism = 2 #68

TheCodeWrangler commented Jan 23, 2024 •

edited

Loading

TheCodeWrangler commented Jan 24, 2024 •

edited

Loading

Error when Running LLAMA with tensor parallelism = 2 #68

Error when Running LLAMA with tensor parallelism = 2 #68

Comments

TheCodeWrangler commented Jan 23, 2024 • edited Loading

TheCodeWrangler commented Jan 24, 2024 • edited Loading

TheCodeWrangler commented Jan 23, 2024 •

edited

Loading

TheCodeWrangler commented Jan 24, 2024 •

edited

Loading