-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mpt model generation #1696
Fix mpt model generation #1696
Conversation
@mengniwang95 I am getting a different error with your fix >>> python run_generation.py --model_name_or_path mosaicml/mpt-7b-chat --use_hpu_graphs --use_kv_cache --bf16 --batch_size=1 --trust_remote_code
Traceback (most recent call last):
File "/root/optimum-habana/examples/text-generation/run_generation.py", line 785, in <module>
main()
File "/root/optimum-habana/examples/text-generation/run_generation.py", line 545, in main
generate(None, args.reduce_recompile)
File "/root/optimum-habana/examples/text-generation/run_generation.py", line 516, in generate
outputs = model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 1468, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 2440, in _sample
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/modeling_mpt.py", line 489, in prepare_inputs_for_generation
raise NotImplementedError('MPT does not support generation with right padding.')
NotImplementedError: MPT does not support generation with right padding. |
@atakaha can you review this PR too? |
@mengniwang95 , I observed same error that @yafshar reported with docker 1.20.0. |
Hi @atakaha @imangohari1 , I found in this script, static_shapes of generation_config is True with the cmd, and optimum-habana will pad the input_ids with 0 according to max_new_tokens parameters on the right. Since mpt doesn't generation with right padding, it raise the error. |
I dig a bit more. It sounds like when you are using >>> python run_generation.py --model_name_or_path mosaicml/mpt-7b-chat --use_hpu_graphs --use_kv_cache --bf16 --batch_size=1
01/21/2025 22:02:55 - INFO - __main__ - Time to first token = 10.8794219995616ms
01/21/2025 22:02:56 - INFO - __main__ - Time to first token = 11.164702002133708ms
Input/outputs:
input 1: ('DeepSpeed is a machine learning framework',)
output 1.1: ('DeepSpeed is a machine learning framework for building predictive models. It is designed to be flexible and scalable, and can be used for a wide range of applications, including fraud detection, recommendation systems, and natural language processing.\n\nDeepSpeed is built on top of the TensorFlow library, which is an open-source software library for dataflow and differentiable programming across a range of tasks. It provides a high-level API for building and training deep learning models, and includes a range of tools for data preprocessing, model evaluation,',)
Stats:
-----------------------------------------------------------------------------------
Input tokens
Throughput (including tokenization) = 142.45309764177765 tokens/second
Memory allocated = 12.87 GB
Max memory allocated = 12.89 GB
Total memory available = 94.62 GB
Graph compilation duration = 4.376771092000126 seconds
----------------------------------------------------------------------------------- |
I am going and approve these changes, but we should check the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Hi @regisss, this PR is ready for your final review. Could you please take a look?
The code quality check failed, please run |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Fixes # (issue)
python run_generation.py --model_name_or_path mosaicml/mpt-7b-chat --use_hpu_graphs --use_kv_cache --bf16 --batch_size=1
File "/optimum-habana/examples/text-generation/run_generation.py", line 773, in
main()
File "/optimum-habana/examples/text-generation/run_generation.py", line 533, in main
generate(None, args.reduce_recompile)
File "/optimum-habana/examples/text-generation/run_generation.py", line 504, in generate
outputs = model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 997, in generate
generation_config, model_kwargs = self._prepare_generation_config(generation_config, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 734, in _prepare_generation_config
model_kwargs = generation_config.update(**kwargs) # All unused kwargs must be model kwargs
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py", line 1282, in update
self.validate()
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py", line 578, in validate
if self.pad_token_id is not None and self.pad_token_id < 0:
TypeError: '<' not supported between instances of 'list' and 'int'