Fix mpt model generation #1696

mengniwang95 · 2025-01-14T08:33:43Z

Fixes # (issue)

python run_generation.py --model_name_or_path mosaicml/mpt-7b-chat --use_hpu_graphs --use_kv_cache --bf16 --batch_size=1

File "/optimum-habana/examples/text-generation/run_generation.py", line 773, in
main()
File "/optimum-habana/examples/text-generation/run_generation.py", line 533, in main
generate(None, args.reduce_recompile)
File "/optimum-habana/examples/text-generation/run_generation.py", line 504, in generate
outputs = model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 997, in generate
generation_config, model_kwargs = self._prepare_generation_config(generation_config, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 734, in _prepare_generation_config
model_kwargs = generation_config.update(**kwargs) # All unused kwargs must be model kwargs
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py", line 1282, in update
self.validate()
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py", line 578, in validate
if self.pad_token_id is not None and self.pad_token_id < 0:
TypeError: '<' not supported between instances of 'list' and 'int'

yafshar · 2025-01-14T17:22:58Z

@mengniwang95 I am getting a different error with your fix

>>> python run_generation.py --model_name_or_path mosaicml/mpt-7b-chat --use_hpu_graphs --use_kv_cache --bf16 --batch_size=1 --trust_remote_code

Traceback (most recent call last):
  File "/root/optimum-habana/examples/text-generation/run_generation.py", line 785, in <module>
    main()
  File "/root/optimum-habana/examples/text-generation/run_generation.py", line 545, in main
    generate(None, args.reduce_recompile)
  File "/root/optimum-habana/examples/text-generation/run_generation.py", line 516, in generate
    outputs = model.generate(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 1468, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/generation/utils.py", line 2440, in _sample
    model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/mosaicml/mpt-7b-chat/1fe2374291e730f7c58ceb1bf49960082371b551/modeling_mpt.py", line 489, in prepare_inputs_for_generation
    raise NotImplementedError('MPT does not support generation with right padding.')
NotImplementedError: MPT does not support generation with right padding.

imangohari1 · 2025-01-16T19:55:30Z

@atakaha can you review this PR too?

atakaha · 2025-01-17T16:29:58Z

@mengniwang95 , I observed same error that @yafshar reported with docker 1.20.0.

mengniwang95 · 2025-01-20T14:34:23Z

Hi @atakaha @imangohari1 , I found in this script, static_shapes of generation_config is True with the cmd, and optimum-habana will pad the input_ids with 0 according to max_new_tokens parameters on the right. Since mpt doesn't generation with right padding, it raise the error.
Do you have any suggestion?

yafshar · 2025-01-21T22:06:46Z

Hi @atakaha @imangohari1 , I found in this script, static_shapes of generation_config is True with the cmd, and optimum-habana will pad the input_ids with 0 according to max_new_tokens parameters on the right. Since mpt doesn't generation with right padding, it raise the error. Do you have any suggestion?

I dig a bit more. It sounds like when you are using --trust_remote_code it is overwriting the Gaudi model and uses a HF model, that is why this error is happening but if you do not use that, your patch is working

>>> python run_generation.py --model_name_or_path mosaicml/mpt-7b-chat --use_hpu_graphs --use_kv_cache --bf16 --batch_size=1

01/21/2025 22:02:55 - INFO - __main__ - Time to first token = 10.8794219995616ms
01/21/2025 22:02:56 - INFO - __main__ - Time to first token = 11.164702002133708ms

Input/outputs:
input 1: ('DeepSpeed is a machine learning framework',)
output 1.1: ('DeepSpeed is a machine learning framework for building predictive models. It is designed to be flexible and scalable, and can be used for a wide range of applications, including fraud detection, recommendation systems, and natural language processing.\n\nDeepSpeed is built on top of the TensorFlow library, which is an open-source software library for dataflow and differentiable programming across a range of tasks. It provides a high-level API for building and training deep learning models, and includes a range of tools for data preprocessing, model evaluation,',)


Stats:
-----------------------------------------------------------------------------------
Input tokens
Throughput (including tokenization) = 142.45309764177765 tokens/second
Memory allocated                    = 12.87 GB
Max memory allocated                = 12.89 GB
Total memory available              = 94.62 GB
Graph compilation duration          = 4.376771092000126 seconds
-----------------------------------------------------------------------------------

yafshar · 2025-01-21T22:08:45Z

I am going and approve these changes, but we should check the --trust_remote_code separately. @mengniwang95 please correct the README of this PR without this option. Thanks

yafshar

LGTM!

Hi @regisss, this PR is ready for your final review. Could you please take a look?

github-actions · 2025-01-24T16:19:18Z

The code quality check failed, please run make style.

HuggingFaceDocBuilderDev · 2025-01-24T16:22:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss

LGTM!

Fix mpt model generation

fbda4e9

mengniwang95 requested a review from regisss as a code owner January 14, 2025 08:33

yafshar approved these changes Jan 21, 2025

View reviewed changes

libinta added the run-test Run CI for PRs from external contributors label Jan 23, 2025

Merge remote-tracking branch 'optimum-habana/main' into patch-1

38ef1d3

regisss approved these changes Jan 24, 2025

View reviewed changes

regisss merged commit 1e9bd35 into huggingface:main Jan 24, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mpt model generation #1696

Fix mpt model generation #1696

mengniwang95 commented Jan 14, 2025 •

edited

Loading

yafshar commented Jan 14, 2025

imangohari1 commented Jan 16, 2025

atakaha commented Jan 17, 2025 •

edited

Loading

mengniwang95 commented Jan 20, 2025

yafshar commented Jan 21, 2025 •

edited

Loading

yafshar commented Jan 21, 2025

yafshar left a comment

github-actions bot commented Jan 24, 2025

HuggingFaceDocBuilderDev commented Jan 24, 2025

regisss left a comment

Fix mpt model generation #1696

Fix mpt model generation #1696

Conversation

mengniwang95 commented Jan 14, 2025 • edited Loading

yafshar commented Jan 14, 2025

imangohari1 commented Jan 16, 2025

atakaha commented Jan 17, 2025 • edited Loading

mengniwang95 commented Jan 20, 2025

yafshar commented Jan 21, 2025 • edited Loading

yafshar commented Jan 21, 2025

yafshar left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 24, 2025

HuggingFaceDocBuilderDev commented Jan 24, 2025

regisss left a comment

Choose a reason for hiding this comment

mengniwang95 commented Jan 14, 2025 •

edited

Loading

atakaha commented Jan 17, 2025 •

edited

Loading

yafshar commented Jan 21, 2025 •

edited

Loading