Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of analysis of all vocabulary terms #327

Open
ledsoft opened this issue Feb 4, 2025 · 1 comment
Open

Improve performance of analysis of all vocabulary terms #327

ledsoft opened this issue Feb 4, 2025 · 1 comment
Assignees
Labels
performance Performance issue

Comments

@ledsoft
Copy link
Contributor

ledsoft commented Feb 4, 2025

When a term's label or definition changes, it triggers text analysis of all terms in the same vocabulary. This can be quite time consuming when a vocabulary is large. As each term is analyzed separately, we should consider trying to batch the analysis, for example by joining the definitions into a single string that could be analyzed in one go and then separate the parts again in post-processing.

@ledsoft ledsoft added the performance Performance issue label Feb 4, 2025
@ledsoft
Copy link
Contributor Author

ledsoft commented Feb 6, 2025

Given the way text analysis works, my idea is that TermIt would take definitions of all terms in a vocabulary, join them into a single HTML document where for each term there would be an element with id corresponding to the term. After analysis, the document would be split based on these elements and process the results as usual. This would mean there would be just one call to the text analysis service and the results could be processed in a single transaction.
Example text analysis input document structure:

<html>
<body>
<div id="http://example.com/termOne">
Term one definition text
</div>
<div id="http://example.com/termTwo">
Term two definition
</div>
</body>
</html>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance issue
Projects
None yet
Development

No branches or pull requests

2 participants