Test different input sequence lengths for Llama #1070

pmarkovicTT · 2025-01-20T16:23:04Z

Add test to make sure Llama compiles and run fwd pass with different input sequence lengths as we will have inputs of various lengths during training.

Close #1071

forge/test/mlir/llama/test_llama_inference.py

nvukobratTT · 2025-01-20T17:11:52Z

forge/test/mlir/llama/test_llama_inference.py

+    ],
+)
+@pytest.mark.parametrize("seq_len", [1, 2, 4, 7, 8, 16, 28, 32, 63, 64, 99, 117, 128, 256, 341, 512, 1024, 1790, 2048])
+@pytest.mark.skip(reason="No need to run in CI as it takes a long time to run.")


My recommendation is to choose which of these will be part of the training focus, instead of skipping it entirely.

E.g. if we're going to focus on training 2048 seq len model, let's fully compile and run as part of push CI that variant alone.

That's right - understanding which sequence length is relevant for Llama finetuning is one of the training team's tasks.
Once we establish which set of seq lengths is needed, we will continue with PCC tests and run as part of CI.

Agree as well, will update seq_len parameters with required ones for training once we choose them (we will run some experiments separately).

This is updated to use only dim sizes we care about. Additionally, I setup only one hidden layer to be used for test to speed it up (while I also ran full model test locally to make sure it passes).

nvukobratTT · 2025-01-20T17:13:25Z

forge/test/mlir/llama/test_llama_inference.py

+    input_ids = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
+
+    # Compile the model and run fwd pass
+    compiled_model = forge.compile(framework_model, input_ids)


Do we want to test out bwd compile/run as well?

One general question, is there a clean way to test a backward part of a graph in isolation? For example, our compile should return compiled context that contains information about each compiled component (e.g. fwd, bwd, loss, etc.).

Therefore, is there a clean way to just call the bwd part of the graph with random inputs, without a need to run the forward part, and initialize the loss and optimizer part of the training workflow?

Note: this is not a requirement for this PR, just a general question that can be useful here as well. I.e. can we have granular tests that target specific functionality, rather than the whole workflow (only the bwd part of the model). I see this as especially useful for bwd generallity push in the future. cc @vladimirjovanovicTT

I think this is a must-have functionality as part of our training generality/BFS effort.
Let's discuss the implementation details offline.

github-actions · 2025-01-20T17:43:52Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	823 ran	490 passed	333 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-01-20T17:48:19Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	665 ran	434 passed	231 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-01-21T16:09:01Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	665 ran	437 passed	228 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-01-21T16:15:23Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	823 ran	492 passed	331 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-02-04T17:42:52Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	510 ran	451 passed	59 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-02-04T17:43:01Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	510 ran	451 passed	59 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-02-04T17:44:13Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	568 ran	489 passed	79 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-02-04T17:47:29Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	568 ran	489 passed	79 skipped	0 failed

Test	Result
No test annotations available

nvukobratTT · 2025-02-07T14:34:52Z

forge/test/mlir/llama/test_llama_inference.py

+    "model_path",
+    [
+        "openlm-research/open_llama_3b",
+        pytest.param("meta-llama/Llama-3.2-1B", marks=pytest.mark.xfail(reason="Unsupported Op: repeat_interleave")),


Support for repeat_interleave is added, so feel free to test out 3.2. 1B on latest main :))

nvukobratTT · 2025-02-07T14:37:31Z

forge/test/mlir/llama/test_llama_inference.py

+
+    prompt = "Q: What is the largest animal?\nA:"
+    input_ids = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
+    input_ids = input_ids.to(torch.int32)


Why is this required? What is the default type for input IDs?

Do we expect that embedding input will always be int-based? If yes, maybe we should have a pass that will encompass this.

Default type is int64 and we need to cast it due to following issue #952

Yep, embedding inputs are int-based (indices in the vocabulary), but I am not sure what you mean about another pass.

nvukobratTT · 2025-02-07T14:38:33Z

forge/test/mlir/llama/utils/utils.py

@@ -16,6 +16,7 @@ def load_model(model_path="openlm-research/open_llama_3b", **kwargs):
    config.use_cache = kwargs.get("use_cache", False)
    config.output_attentions = kwargs.get("output_attentions", False)
    config.output_hidden_states = kwargs.get("output_hidden_states", False)
+    config.num_hidden_layers = kwargs.get("num_hidden_layers", 26)


Was this intentional?

Any specific reasons for updating original number of hidden layers?

Yep, that's per our discussion in the last sync. Running llama with all layers takes quite some time and since this is not e2e/demo test, I thought it makes sense to speed it up by using a single layer.

nvukobratTT · 2025-02-07T14:44:56Z

forge/test/mlir/llama/test_llama_inference.py

+        pytest.param("meta-llama/Llama-3.2-1B", marks=pytest.mark.xfail(reason="Unsupported Op: repeat_interleave")),
+    ],
+)
+@pytest.mark.parametrize("seq_len", [128, 512, 2048])


Any thoughts on testing on lower precisions? E.g. bfloat16?

In full precision, Open Llama will require 12GB, while 3.2. 4GB. That said, we should either:

Test out lower precision DF (Open Llama will barely fit n150 for inference, definitely not for training)

Focus only on Llama 3.2 for training. In this case as well, we'll need to run in half-precision for training in order to fit on n150 (depending on which optimizer we use during fine-tuning; full training will probably be a stretch)

Yep you are completely right. That's something we plan to do with llama backward/training tests, and eventually incorporate this test there as well. We are currently investigating memory footprint of llama models on GPU to find optimal setup for our devices. Our plan is to add tests based on findings.

nvukobratTT · 2025-02-24T09:03:04Z

@pmarkovicTT is this one ready for review and potential merge? If not, can we move it to draft?

github-actions · 2025-02-25T13:42:48Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	670 ran	536 passed	134 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-02-25T13:45:20Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	611 ran	482 passed	129 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-02-25T13:45:55Z

	Tests	Passed ☑️	Skipped ⚠️	Failed ❌️
TT-Forge-FE Tests	611 ran	481 passed	129 skipped	1 failed

Test	Result
TT-Forge-FE Tests
pytest
test_dla.test_dla_pytorch[dla34]	❌ failure

github-actions · 2025-02-25T13:48:25Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	670 ran	536 passed	134 skipped	0 failed

Test	Result
No test annotations available

pmarkovicTT · 2025-02-27T12:22:22Z

@nvukobratTT PR is ready for review/merge.

Summary of previous conversations:

We selected input sequence sizes that matter to us (128, 512, 2048)
Added test for only 1 layer as we talked in one of previous syncs to run test faster
Embedding inputs have to be int and it's explicitly casted to int32 due to following issue Invalid Runtime inputs to embedding #952
When it comes to training/testing in lower precision, that's something we will do and incorporate in future training tests we add

github-actions · 2025-02-27T13:52:38Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	616 ran	484 passed	132 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-02-27T13:56:44Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	675 ran	538 passed	137 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-02-27T14:02:07Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	616 ran	484 passed	132 skipped	0 failed

Test	Result
No test annotations available

github-actions · 2025-02-27T14:22:33Z

	Tests	Passed ✅	Skipped ⚠️	Failed
TT-Forge-FE Tests	675 ran	538 passed	137 skipped	0 failed

Test	Result
No test annotations available

pmarkovicTT requested review from nvukobratTT, pilkicTT and dgolubovicTT as code owners January 20, 2025 16:23

pmarkovicTT requested a review from vladimirjovanovicTT January 20, 2025 16:23

pmarkovicTT self-assigned this Jan 20, 2025

nvukobratTT reviewed Jan 20, 2025

View reviewed changes

pmarkovicTT force-pushed the pmarkovic/test-input-seq-lengths-llama branch from 5b45490 to 710afb4 Compare February 4, 2025 16:47

pmarkovicTT requested a review from nvukobratTT February 4, 2025 16:52

nvukobratTT reviewed Feb 7, 2025

View reviewed changes

pmarkovicTT added 3 commits February 25, 2025 13:00

Add test for different input seq lens

d925ef0

Add padding and truncation

114d19e

Add verify check

f081375

pmarkovicTT force-pushed the pmarkovic/test-input-seq-lengths-llama branch from 710afb4 to f081375 Compare February 25, 2025 13:01

clean code

2ab4f57

pmarkovicTT requested a review from nvukobratTT February 27, 2025 12:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test different input sequence lengths for Llama #1070

Test different input sequence lengths for Llama #1070

pmarkovicTT commented Jan 20, 2025 •

edited

Loading

nvukobratTT Jan 20, 2025

vladimirjovanovicTT Jan 20, 2025

pmarkovicTT Jan 21, 2025

pmarkovicTT Feb 4, 2025

nvukobratTT Jan 20, 2025

vladimirjovanovicTT Jan 20, 2025

github-actions bot commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Feb 4, 2025

github-actions bot commented Feb 4, 2025

github-actions bot commented Feb 4, 2025

github-actions bot commented Feb 4, 2025

nvukobratTT Feb 7, 2025

nvukobratTT Feb 7, 2025

pmarkovicTT Feb 7, 2025

nvukobratTT Feb 7, 2025

pmarkovicTT Feb 7, 2025

nvukobratTT Feb 7, 2025

pmarkovicTT Feb 7, 2025

nvukobratTT commented Feb 24, 2025

github-actions bot commented Feb 25, 2025

github-actions bot commented Feb 25, 2025

github-actions bot commented Feb 25, 2025

github-actions bot commented Feb 25, 2025

pmarkovicTT commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

Test different input sequence lengths for Llama #1070

Are you sure you want to change the base?

Test different input sequence lengths for Llama #1070

Conversation

pmarkovicTT commented Jan 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 20, 2025

github-actions bot commented Jan 20, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Jan 21, 2025

github-actions bot commented Feb 4, 2025

github-actions bot commented Feb 4, 2025

github-actions bot commented Feb 4, 2025

github-actions bot commented Feb 4, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvukobratTT commented Feb 24, 2025

github-actions bot commented Feb 25, 2025

github-actions bot commented Feb 25, 2025

github-actions bot commented Feb 25, 2025

github-actions bot commented Feb 25, 2025

pmarkovicTT commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

github-actions bot commented Feb 27, 2025

pmarkovicTT commented Jan 20, 2025 •

edited

Loading