Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mpt model generation #1696

Merged
merged 2 commits into from
Jan 24, 2025
Merged

Fix mpt model generation #1696

merged 2 commits into from
Jan 24, 2025

Conversation

mengniwang95
Copy link
Contributor

@mengniwang95 mengniwang95 commented Jan 14, 2025

Fixes # (issue)

python run_generation.py --model_name_or_path mosaicml/mpt-7b-chat --use_hpu_graphs --use_kv_cache --bf16 --batch_size=1

File "/optimum-habana/examples/text-generation/run_generation.py", line 773, in
main()
File "/optimum-habana/examples/text-generation/run_generation.py", line 533, in main
generate(None, args.reduce_recompile)
File "/optimum-habana/examples/text-generation/run_generation.py", line 504, in generate
outputs = model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 997, in generate
generation_config, model_kwargs = self._prepare_generation_config(generation_config, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 734, in _prepare_generation_config
model_kwargs = generation_config.update(**kwargs) # All unused kwargs must be model kwargs
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py", line 1282, in update
self.validate()
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py", line 578, in validate
if self.pad_token_id is not None and self.pad_token_id < 0:
TypeError: '<' not supported between instances of 'list' and 'int'

@mengniwang95 mengniwang95 requested a review from regisss as a code owner January 14, 2025 08:33
@yafshar
Copy link
Contributor

yafshar commented Jan 14, 2025

@mengniwang95 I am getting a different error with your fix

>>> python run_generation.py --model_name_or_path mosaicml/mpt-7b-chat --use_hpu_graphs --use_kv_cache --bf16 --batch_size=1 --trust_remote_code

Traceback (most recent call last):
  File "/root/optimum-habana/examples/text-generation/run_generation.py", line 785, in <module>
    main()
  File "/root/optimum-habana/examples/text-generation/run_generation.py", line 545, in main
    generate(None, args.reduce_recompile)
  File "/root/optimum-habana/examples/text-generation/run_generation.py", line 516, in generate
    outputs = model.generate(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 1468, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 2440, in _sample
    model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/modeling_mpt.py", line 489, in prepare_inputs_for_generation
    raise NotImplementedError('MPT does not support generation with right padding.')
NotImplementedError: MPT does not support generation with right padding.

@imangohari1
Copy link
Contributor

@atakaha can you review this PR too?

@atakaha
Copy link
Contributor

atakaha commented Jan 17, 2025

@mengniwang95 , I observed same error that @yafshar reported with docker 1.20.0.

@mengniwang95
Copy link
Contributor Author

Hi @atakaha @imangohari1 , I found in this script, static_shapes of generation_config is True with the cmd, and optimum-habana will pad the input_ids with 0 according to max_new_tokens parameters on the right. Since mpt doesn't generation with right padding, it raise the error.
Do you have any suggestion?

@yafshar
Copy link
Contributor

yafshar commented Jan 21, 2025

Hi @atakaha @imangohari1 , I found in this script, static_shapes of generation_config is True with the cmd, and optimum-habana will pad the input_ids with 0 according to max_new_tokens parameters on the right. Since mpt doesn't generation with right padding, it raise the error. Do you have any suggestion?

I dig a bit more. It sounds like when you are using --trust_remote_code it is overwriting the Gaudi model and uses a HF model, that is why this error is happening but if you do not use that, your patch is working

>>> python run_generation.py --model_name_or_path mosaicml/mpt-7b-chat --use_hpu_graphs --use_kv_cache --bf16 --batch_size=1

01/21/2025 22:02:55 - INFO - __main__ - Time to first token = 10.8794219995616ms
01/21/2025 22:02:56 - INFO - __main__ - Time to first token = 11.164702002133708ms

Input/outputs:
input 1: ('DeepSpeed is a machine learning framework',)
output 1.1: ('DeepSpeed is a machine learning framework for building predictive models. It is designed to be flexible and scalable, and can be used for a wide range of applications, including fraud detection, recommendation systems, and natural language processing.\n\nDeepSpeed is built on top of the TensorFlow library, which is an open-source software library for dataflow and differentiable programming across a range of tasks. It provides a high-level API for building and training deep learning models, and includes a range of tools for data preprocessing, model evaluation,',)


Stats:
-----------------------------------------------------------------------------------
Input tokens
Throughput (including tokenization) = 142.45309764177765 tokens/second
Memory allocated                    = 12.87 GB
Max memory allocated                = 12.89 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 4.376771092000126 seconds
-----------------------------------------------------------------------------------

@yafshar
Copy link
Contributor

yafshar commented Jan 21, 2025

I am going and approve these changes, but we should check the --trust_remote_code separately. @mengniwang95 please correct the README of this PR without this option. Thanks

Copy link
Contributor

@yafshar yafshar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Hi @regisss, this PR is ready for your final review. Could you please take a look?

@libinta libinta added the run-test Run CI for PRs from external contributors label Jan 23, 2025
Copy link

The code quality check failed, please run make style.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@regisss regisss merged commit 1e9bd35 into huggingface:main Jan 24, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-test Run CI for PRs from external contributors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants