Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Tokenizer for several added special tokens #1659

Merged

Conversation

pavel-esir
Copy link
Contributor

@pavel-esir pavel-esir commented Jan 30, 2025

Flux black-forest-labs/FLUX.1-dev adds special tokens to the end as well and ov_genai.Tokenizer was correctly handling such cases.

Ticket: CVS-157356

@pavel-esir pavel-esir added the bug Something isn't working label Jan 30, 2025
@pavel-esir pavel-esir added this to the 2025.1 milestone Jan 30, 2025
@github-actions github-actions bot added the category: tokenizers Tokenizer class or submodule update label Jan 30, 2025
@pavel-esir pavel-esir force-pushed the fix_for_several_added_tokens branch from bed040d to d6c7419 Compare January 30, 2025 19:02
std::shared_ptr<ov::Node> combine_seg_node;
for (auto node: model->get_ordered_ops()) {
if (strcmp(node->get_type_info().name, "CombineSegments") == 0) {
combine_seg_node = node;
}
}
if (!combine_seg_node || combine_seg_node->input_value(1).get_element_type() != ov::element::i32) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Input type is always i32, this is ensured in validate_and_infer_types()

@ilya-lavrenov
Copy link
Contributor

CVS-157356

is the ticket number valid?

@pavel-esir
Copy link
Contributor Author

CVS-157356

is the ticket number valid?

There was no ticket specifically for this bug, but the bug was found during work on the ticket.

@ilya-lavrenov
Copy link
Contributor

Please, fix:
{1EDF583C-10AA-4100-BA5D-7D8DA888FAFB}

@pavel-esir
Copy link
Contributor Author

Please, fix:
{1EDF583C-10AA-4100-BA5D-7D8DA888FAFB}

Done.

@pavel-esir pavel-esir disabled auto-merge February 4, 2025 20:12
@pavel-esir pavel-esir enabled auto-merge February 4, 2025 20:12
@github-actions github-actions bot added the category: sampling Sampling / Decoding algorithms label Feb 5, 2025
@pavel-esir pavel-esir force-pushed the fix_for_several_added_tokens branch from 253f8b4 to e96c619 Compare February 5, 2025 08:13
@pavel-esir pavel-esir added this pull request to the merge queue Feb 7, 2025
Merged via the queue into openvinotoolkit:master with commit 06a95e4 Feb 7, 2025
62 checks passed
@pavel-esir pavel-esir deleted the fix_for_several_added_tokens branch February 7, 2025 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working category: sampling Sampling / Decoding algorithms category: tokenizers Tokenizer class or submodule update no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants