Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endless loop in findPartialMatches, azure plugin code #610

Closed
jweingarten-rv opened this issue Jan 13, 2025 · 6 comments
Closed

Endless loop in findPartialMatches, azure plugin code #610

jweingarten-rv opened this issue Jan 13, 2025 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@jweingarten-rv
Copy link

Version

    "@nut-tree/nl-matcher": "^3.0.2",
    "@nut-tree/nut-js": "^4.3.0",
    "@nut-tree/plugin-azure": "^2.1.0",

Short overview
While upgrading plugin-azure from version 1 to 2.1.0 we are running into an endless loop that boils down to a bug in findPartialMatches,js in @nut-tree\plugin-azure\dist\ocr\stringQueryProcessing.function.js.

node version:
v20.11.0

OS type and version:
Windows / Mac

Detailed error description
Full code sample to reproduce

While upgrading plugin-azure from version 1 to 2.1.0 we are running into an endless loop that boils down to a bug in findPartialMatches in @nut-tree\plugin-azure\dist\ocr\stringQueryProcessing.function.js.

During execution the function findPartialMatches is called with the following parameters:

line.text: Edit Actions: Clear All
line.words: ['Edit', 'Actions:', 'Clear', 'All']
config: {partialMatch: true}
searchWords: ['edit', 'action']
pm: []

Note that the line has Edit Actions: and we are searching for edit actions, no colon in the search.

When it loops through the code it ends up in

 else {
   idx += selectedWords.length;
  selectedWords = [];
}

That will reset idx back to 0 and you are in an endless loop.

Here is a fix, not pretty and most likely not how you will fix it, that I am currently using so that I can stay on version 2.1.0:

function findPartialMatches(line, config, searchWords, pm) {
    let selectedWords = [];
    nut_js_1.providerRegistry.getLogProvider().debug(`Searching through ${line.words.length} words`);
    for (let idx = 0; idx < line.words.length;) {
        let cnt = -1;
        const currentWord = line.words[idx];
        const currentWordText = config?.caseSensitive ? currentWord.text : currentWord.text.toLocaleLowerCase();
        if (currentWordText === searchWords[0]) {
            selectedWords = [currentWord];
            for (let i = 1; i < searchWords.length; ++i) {
                const nextWord = line.words[idx + i];
                const nextWordText = config?.caseSensitive ? nextWord.text : nextWord.text.toLocaleLowerCase();
                if (nextWordText !== searchWords[i]) {
                    cnt = idx + i;
                    selectedWords = [];
                    break;
                }
                else {
                    selectedWords.push(nextWord);
                }
            }
            if (selectedWords.length === searchWords.length) {
                idx += selectedWords.length;
                const wordsBoundingBox = (0, determineWordBoundingBox_function_1.determineWordBoundingBoxFunction)(selectedWords);
                if (wordsBoundingBox != null) {
                    pm.push({
                        ...line,
                        words: selectedWords,
                        boundingBox: (0, mapBoundingBoxToRegion_function_1.mapRegionToBoundingBox)(wordsBoundingBox)
                    });
                    nut_js_1.providerRegistry.getLogProvider().debug(`Found match ${selectedWords.map(word => word.text).join(" ")} at ${JSON.stringify(wordsBoundingBox)}`);
                }
            }
            else {
                idx = cnt !== -1 ? cnt : selectedWords.length
                selectedWords = [];
            }
        }
        else {
            ++idx;
        }
    }
}

Additional content

Please provide any (mandatory) additional data to reproduce the error (Dockerfiles etc.)

@s1hofmann
Copy link
Member

Hi @jweingarten-rv 👋

Thank you for this report and the detailed analysis!
I can confirm this bug and I'm currently evaluating possible ways to address this.

What's your expected outcome, given your input data?

@jweingarten-rv
Copy link
Author

jweingarten-rv commented Jan 14, 2025

Hi @s1hofmann,

well I actually was thinking about that as well a bit. To me the partialMatch setting is a bit unclear. Especially when I see that Azure comes back with an array of strings. Should in this case it pass evenhough there is a : in the text? I honestly am not sure.

The person that wrote the test, that uncovered the issue would most likely expect that Edit Actions would partially match Edit Actions: Clear All.

But does that imply that in case of a textLine search we just make sure that indexOf is >=0 anywhere in the line found? Someone could also expect that if you search for Edit Clear in a string of value Edit Actions: Clear All it should come back as found.
Not sure if there is a good solution without breaking existing use cases.

One option could be to add more options. In conjunction with partialMatch additional settings like ignoreNoneAlphanumericCharacters

In my specific use case, I can go either way and make the test pass. Specifically its selecting a menu and I just added the : to the search string to make it pass.

@s1hofmann s1hofmann self-assigned this Jan 21, 2025
@s1hofmann s1hofmann added the bug Something isn't working label Jan 21, 2025
@s1hofmann
Copy link
Member

Hi @jweingarten-rv 👋

I have now fixed your reported bug in both @nut-tree/plugin-azure and @nut-tree/plugin-ocr.
Thank you for reporting it!

After careful consideration I decided against any additional parameters to make search more flexible.
If more flexibility is required, it's possible to pass a regex as search input, which is why I don't see immediate requirement for additional parameters for string based searches.

Snapshot releases are already available, stable releases will follow!

Best regards

Simon

@jweingarten-rv
Copy link
Author

Thanks @s1hofmann.
Didn't know about the regex. Nice.
Once its in stable I'll update and re-test.
Thanks!

@s1hofmann
Copy link
Member

@jweingarten-rv @nut-tree/[email protected] has been released!

@jweingarten-rv
Copy link
Author

Fix confirmed. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants