Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when Running LLAMA with tensor parallelism = 2 #68

Open
TheCodeWrangler opened this issue Jan 23, 2024 · 1 comment
Open

Error when Running LLAMA with tensor parallelism = 2 #68

TheCodeWrangler opened this issue Jan 23, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@TheCodeWrangler
Copy link

TheCodeWrangler commented Jan 23, 2024

I am unable to get the llama example to work with tensor parallelism.

I have 2x L4 machines
NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0

When running the script
https://github.com/huggingface/optimum-nvidia/blob/main/examples/text-generation/llama.py

python text-generation/llama.py /model/torch-weights /model/local-compiled --world-size=2 --tensor-parallelism=2 --dtype=bfloat16 --gpus-per-node=2 --max-batch-size=1 --max-prompt-length=3000 --max-new-tokens=1096 --max-beam-width=1

The engine files build and save successfully, but when attempting to load them in the TensorRTForCausalLM class I get the following error

Error occurs on line 107
https://github.com/huggingface/optimum-nvidia/blob/main/examples/text-generation/llama.py#L107

RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: mpiSize == tp * pp (/opt/optimum-nvidia/third-party/tensorrt-llm/cpp/tensorrt_llm/runtime/worldConfig.cpp:89)
1 0x7f6fe2659212 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x7c212) [0x7f6fe2659212]
2 0x7f6fe2675df9 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x98df9) [0x7f6fe2675df9]
3 0x7f6fe275d847 tensorrt_llm::runtime::WorldConfig::mpi(int, std::optional, std::optional, std::optional<std::vector<int, std::allocator > >) + 103
4 0x7f6fe26a2057 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xc5057) [0x7f6fe26a2057]
5 0x7f6fe2691bc7 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb4bc7) [0x7f6fe2691bc7]
6 0x559ca74b2e0e python(+0x15fe0e) [0x559ca74b2e0e]
7 0x559ca74a95eb _PyObject_MakeTpCall + 603
8 0x559ca74a21f1 _PyEval_EvalFrameDefault + 27297
9 0x559ca758ce56 python(+0x239e56) [0x559ca758ce56]
10 0x559ca758ccf6 PyEval_EvalCode + 134
11 0x559ca75b77d8 python(+0x2647d8) [0x559ca75b77d8]
12 0x559ca75b10bb python(+0x25e0bb) [0x559ca75b10bb]
13 0x559ca740a4d0 python(+0xb74d0) [0x559ca740a4d0]
14 0x559ca740a012 _PyRun_InteractiveLoopObject + 195
15 0x559ca75b6678 _PyRun_AnyFileObject + 104
16 0x559ca73f45c8 PyRun_AnyFileExFlags + 79
17 0x559ca73e96e8 python(+0x966e8) [0x559ca73e96e8]
18 0x559ca757fcad Py_BytesMain + 45
19 0x7f7134164d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f7134164d90]
20 0x7f7134164e40 __libc_start_main + 128
21 0x559ca757fba5 _start + 37

@TheCodeWrangler
Copy link
Author

TheCodeWrangler commented Jan 24, 2024

Further investigation on the function which is throwing the error from the tensorrt_llm.bindings

The same error is generated with the following script from within the huggingface/optimum-nvidia:latest docker container (image id: d08d1226a2ab)

import tensorrt_llm.bindings as ctrrt

gpus_per_node = 2
tensor_parallelism = 2
pipeline_parallelism = 1

ctrrt.WorldConfig.mpi(
            gpus_per_node,
            tensor_parallelism,
            pipeline_parallelism,
        )

@mfuntowicz mfuntowicz added the bug Something isn't working label Feb 8, 2024
@mfuntowicz mfuntowicz self-assigned this Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants