feat(transformers): api(image processor/feature extractor/automodel/pipelines) #802

wcrzlh · 2024-12-25T04:49:12Z

What does this PR do?

This PR(based on #748) contains following features:

Image processor and feature extractor would be served as part of VLLM. It could be tested based on PR749 and test scripts

AutoModel could be used to call model existed on mindone.transformers or could be integrated in Pipelines api:

from mindone.transformers import AutoModel

model = AutoModel.from_pretrained("google-bert/bert-base-uncased")
print(model)

Pipelines part could tested based on following codes:

from mindone.transformers.pipelines import pipeline

generator = pipeline(model="google-bert/bert-base-uncased")
outputs = generator("This is a simple [MASK]")
print(outputs)

Expected output:

[{'score': 0.041297122836112976, 'token':3291, 'token_str': 'problem', 'sequence': 'this is a simple problem.'},
{'score': 0.03821507468819618, 'token':8522, 'token_str': 'equation', 'sequence': 'this is a simple equation'},
{'score': 0.029827609658241272, 'token':3160, 'token_str': 'question', 'sequence': 'this is a simple question'},
{'score': 0.027154073119163513, 'token':7709, 'token_str': 'procedure', 'sequence': 'this is a simple procedure'},
{'score': 0.025617485865950584, 'token':7577, 'token_str': 'trick', 'sequence': 'this is a simple trick'}]

GPT-2 Model could be tested based on Text Generation Pipe:

from mindone.transformers.pipelines import pipeline

generator = pipeline(model="openai-community/gpt2")
outputs = generator("I can't believe you did such a ", do_sample=False)
print(outputs)

Expected Output:

[{"generated_text": "I can't believe you did such a icky thing to me. I'm so sorry. I'm so sorry. I'm so sorry. 
 I'm so sorry.  I'm so sorry.  I'm so sorry.  I'm so sorry. I"}]

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

…ipelines)

zhanghuiyao · 2025-02-06T03:03:49Z

冲突可以先解一下

zhanghuiyao · 2025-02-06T03:06:50Z

mindone/transformers/utils/generic.py

+
+def torch_int(x):
+    """
+    Casts an input to a torch int64 tensor if we are in a tracing context, otherwise to a Python int.


torch -> mindspore 全局换一下

SamitHuang · 2025-02-06T03:08:19Z

mindone/transformers/models/auto/modeling_auto.py

+
+logger = logging.get_logger(__name__)
+
+MODEL_MAPPING_NAMES = OrderedDict(


some supported models in mindone/transformers/models are missing, like llama and gemma2. pls check and add.

SamitHuang · 2025-02-06T03:10:01Z

mindone/transformers/models/auto/modeling_auto.py

+        ("pegasus_x", "PegasusXForConditionalGeneration"),
+        ("plbart", "PLBartForConditionalGeneration"),
+        ("prophetnet", "ProphetNetForConditionalGeneration"),
+        ("qwen2_audio", "Qwen2AudioForConditionalGeneration"),


qwen2_audio not implemented

SamitHuang · 2025-02-06T03:11:29Z

mindone/transformers/models/auto/modeling_auto.py

+    ]
+)
+
+MODEL_FOR_QUESTION_ANSWERING_MAPPING_NAMES = OrderedDict(


either comment those models that are not supported or simply assert and claim will be supported in the future

SamitHuang · 2025-02-06T03:12:45Z

mindone/transformers/models/gpt2/modeling_gpt2.py

+
+        # Downcast (if necessary) back to V's dtype (if in mixed-precision) -- No-Op if otherwise
+        if attn_weights.dtype != ms.float32:
+            raise RuntimeError("Error with upcasting, attn_weights does not have dtype torch.float32")


torch.float32 -> ms.float32

SamitHuang · 2025-02-06T03:15:02Z

mindone/transformers/models/gpt2/modeling_gpt2.py

+            scale_factor /= float(self.layer_idx + 1)
+
+        # Upcast (turn off autocast) and reorder (Scale K by 1 / root(dk))
+        with ms.amp.autocast(query.device.type, enabled=False):


ms2.4.1 doesn't support amp.autocast or tensor.device. how is it from

SamitHuang · 2025-02-06T03:18:27Z

mindone/transformers/models/gpt2/modeling_gpt2.py

+
+class GPT2SdpaAttention(GPT2Attention):
+    """
+    GPT2 attention module using torch.nn.functional.scaled_dot_product_attention. This module inherits from


update docstring for mindspore

SamitHuang · 2025-02-06T03:18:51Z

mindone/transformers/models/gpt2/modeling_gpt2.py

+    Base class for outputs of models predicting if two sentences are consecutive or not.
+
+    Args:
+        loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided):


same docstring problem here

CaitinZhao · 2025-02-06T03:25:07Z

新增模型加下tests，看下和torch的精度误差

wcrzlh added 2 commits December 25, 2024 12:40

feat(transformers): api(image processor/feature extractor/automodel/p…

0cd4937

…ipelines)

feat(transformers): api(image processor/feature extractor/automodel/p…

06f054d

…ipelines)

wcrzlh requested a review from vigo999 as a code owner December 25, 2024 04:49

wcrzlh added 7 commits December 25, 2024 14:15

feat(transformers): api(image processor/feature extractor/automodel/p…

84f9a7b

…ipelines)

feat(transformers): api(image processor/feature extractor/automodel/p…

70931a0

…ipelines)

feat(transformers): api(image processor/feature extractor/automodel/p…

0acc3ef

…ipelines)

feat(transformers): api-support tie_weights

ca5b273

feat(transformers): api-support tie_weights

27d97d7

feat(transformers): support text generation pipe

dccbb5f

feat(transformers): support text generation pipe

1682a1b

vigo999 requested review from zhanghuiyao and CaitinZhao February 6, 2025 02:53

zhanghuiyao reviewed Feb 6, 2025

View reviewed changes

SamitHuang reviewed Feb 6, 2025

View reviewed changes

wcrzlh added 14 commits February 6, 2025 17:51

rebase

df71a10

precommit

046c693

precommit

453fa93

fix import bugs

054127a

fix import bugs

915f5a7

fix import bugs

d98be8b

fix import bugs

7ba6209

fix import bugs

8c33b9f

fix import bugs

a9cafb2

fix import bugs

96a5087

fix generation pipe

729ff99

fix format issue

6096bd1

supplement import libs

c4db00f

supplement import libs

d0866ff

supplement import libs

2044d45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(transformers): api(image processor/feature extractor/automodel/pipelines) #802

feat(transformers): api(image processor/feature extractor/automodel/pipelines) #802

wcrzlh commented Dec 25, 2024 •

edited

Loading

zhanghuiyao commented Feb 6, 2025

zhanghuiyao Feb 6, 2025

SamitHuang Feb 6, 2025

SamitHuang Feb 6, 2025

SamitHuang Feb 6, 2025

SamitHuang Feb 6, 2025

SamitHuang Feb 6, 2025

SamitHuang Feb 6, 2025

SamitHuang Feb 6, 2025

CaitinZhao commented Feb 6, 2025


		logger = logging.get_logger(__name__)

		MODEL_MAPPING_NAMES = OrderedDict(

feat(transformers): api(image processor/feature extractor/automodel/pipelines) #802

Are you sure you want to change the base?

feat(transformers): api(image processor/feature extractor/automodel/pipelines) #802

Conversation

wcrzlh commented Dec 25, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

zhanghuiyao commented Feb 6, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CaitinZhao commented Feb 6, 2025

wcrzlh commented Dec 25, 2024 •

edited

Loading