Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test different input sequence lengths for Llama #1070

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

pmarkovicTT
Copy link
Contributor

@pmarkovicTT pmarkovicTT commented Jan 20, 2025

Add test to make sure Llama compiles and run fwd pass with different input sequence lengths as we will have inputs of various lengths during training.

Close #1071

],
)
@pytest.mark.parametrize("seq_len", [1, 2, 4, 7, 8, 16, 28, 32, 63, 64, 99, 117, 128, 256, 341, 512, 1024, 1790, 2048])
@pytest.mark.skip(reason="No need to run in CI as it takes a long time to run.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recommendation is to choose which of these will be part of the training focus, instead of skipping it entirely.

E.g. if we're going to focus on training 2048 seq len model, let's fully compile and run as part of push CI that variant alone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right - understanding which sequence length is relevant for Llama finetuning is one of the training team's tasks.
Once we establish which set of seq lengths is needed, we will continue with PCC tests and run as part of CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree as well, will update seq_len parameters with required ones for training once we choose them (we will run some experiments separately).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updated to use only dim sizes we care about. Additionally, I setup only one hidden layer to be used for test to speed it up (while I also ran full model test locally to make sure it passes).

input_ids = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids

# Compile the model and run fwd pass
compiled_model = forge.compile(framework_model, input_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to test out bwd compile/run as well?

One general question, is there a clean way to test a backward part of a graph in isolation? For example, our compile should return compiled context that contains information about each compiled component (e.g. fwd, bwd, loss, etc.).

Therefore, is there a clean way to just call the bwd part of the graph with random inputs, without a need to run the forward part, and initialize the loss and optimizer part of the training workflow?

Note: this is not a requirement for this PR, just a general question that can be useful here as well. I.e. can we have granular tests that target specific functionality, rather than the whole workflow (only the bwd part of the model). I see this as especially useful for bwd generallity push in the future. cc @vladimirjovanovicTT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a must-have functionality as part of our training generality/BFS effort.
Let's discuss the implementation details offline.

Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests823 ran490 passed333 skipped0 failed
TestResult
No test annotations available

Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests665 ran434 passed231 skipped0 failed
TestResult
No test annotations available

Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests665 ran437 passed228 skipped0 failed
TestResult
No test annotations available

Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests823 ran492 passed331 skipped0 failed
TestResult
No test annotations available

@pmarkovicTT pmarkovicTT force-pushed the pmarkovic/test-input-seq-lengths-llama branch from 5b45490 to 710afb4 Compare February 4, 2025 16:47
Copy link

github-actions bot commented Feb 4, 2025

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests510 ran451 passed59 skipped0 failed
TestResult
No test annotations available

1 similar comment
Copy link

github-actions bot commented Feb 4, 2025

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests510 ran451 passed59 skipped0 failed
TestResult
No test annotations available

Copy link

github-actions bot commented Feb 4, 2025

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests568 ran489 passed79 skipped0 failed
TestResult
No test annotations available

1 similar comment
Copy link

github-actions bot commented Feb 4, 2025

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests568 ran489 passed79 skipped0 failed
TestResult
No test annotations available

"model_path",
[
"openlm-research/open_llama_3b",
pytest.param("meta-llama/Llama-3.2-1B", marks=pytest.mark.xfail(reason="Unsupported Op: repeat_interleave")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for repeat_interleave is added, so feel free to test out 3.2. 1B on latest main :))


prompt = "Q: What is the largest animal?\nA:"
input_ids = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
input_ids = input_ids.to(torch.int32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required? What is the default type for input IDs?

Do we expect that embedding input will always be int-based? If yes, maybe we should have a pass that will encompass this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default type is int64 and we need to cast it due to following issue #952

Yep, embedding inputs are int-based (indices in the vocabulary), but I am not sure what you mean about another pass.

@@ -16,6 +16,7 @@ def load_model(model_path="openlm-research/open_llama_3b", **kwargs):
config.use_cache = kwargs.get("use_cache", False)
config.output_attentions = kwargs.get("output_attentions", False)
config.output_hidden_states = kwargs.get("output_hidden_states", False)
config.num_hidden_layers = kwargs.get("num_hidden_layers", 26)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this intentional?

Any specific reasons for updating original number of hidden layers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's per our discussion in the last sync. Running llama with all layers takes quite some time and since this is not e2e/demo test, I thought it makes sense to speed it up by using a single layer.

pytest.param("meta-llama/Llama-3.2-1B", marks=pytest.mark.xfail(reason="Unsupported Op: repeat_interleave")),
],
)
@pytest.mark.parametrize("seq_len", [128, 512, 2048])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on testing on lower precisions? E.g. bfloat16?

In full precision, Open Llama will require 12GB, while 3.2. 4GB. That said, we should either:

  • Test out lower precision DF (Open Llama will barely fit n150 for inference, definitely not for training)
  • Focus only on Llama 3.2 for training. In this case as well, we'll need to run in half-precision for training in order to fit on n150 (depending on which optimizer we use during fine-tuning; full training will probably be a stretch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep you are completely right. That's something we plan to do with llama backward/training tests, and eventually incorporate this test there as well. We are currently investigating memory footprint of llama models on GPU to find optimal setup for our devices. Our plan is to add tests based on findings.

@nvukobratTT
Copy link
Contributor

@pmarkovicTT is this one ready for review and potential merge? If not, can we move it to draft?

@pmarkovicTT pmarkovicTT force-pushed the pmarkovic/test-input-seq-lengths-llama branch from 710afb4 to f081375 Compare February 25, 2025 13:01
Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests670 ran536 passed134 skipped0 failed
TestResult
No test annotations available

Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests611 ran482 passed129 skipped0 failed
TestResult
No test annotations available

Copy link

TestsPassed ☑️Skipped ⚠️Failed ❌️
TT-Forge-FE Tests611 ran481 passed129 skipped1 failed
TestResult
TT-Forge-FE Tests
pytest
test_dla.test_dla_pytorch[dla34]❌ failure

Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests670 ran536 passed134 skipped0 failed
TestResult
No test annotations available

@pmarkovicTT
Copy link
Contributor Author

@nvukobratTT PR is ready for review/merge.

Summary of previous conversations:

  • We selected input sequence sizes that matter to us (128, 512, 2048)
  • Added test for only 1 layer as we talked in one of previous syncs to run test faster
  • Embedding inputs have to be int and it's explicitly casted to int32 due to following issue Invalid Runtime inputs to embedding #952
  • When it comes to training/testing in lower precision, that's something we will do and incorporate in future training tests we add

Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests616 ran484 passed132 skipped0 failed
TestResult
No test annotations available

Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests675 ran538 passed137 skipped0 failed
TestResult
No test annotations available

Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests616 ran484 passed132 skipped0 failed
TestResult
No test annotations available

Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Forge-FE Tests675 ran538 passed137 skipped0 failed
TestResult
No test annotations available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Different input sequence length
3 participants