Improve performance of analysis of all vocabulary terms #327

ledsoft · 2025-02-04T10:47:08Z

When a term's label or definition changes, it triggers text analysis of all terms in the same vocabulary. This can be quite time consuming when a vocabulary is large. As each term is analyzed separately, we should consider trying to batch the analysis, for example by joining the definitions into a single string that could be analyzed in one go and then separate the parts again in post-processing.

ledsoft · 2025-02-06T09:10:18Z

Given the way text analysis works, my idea is that TermIt would take definitions of all terms in a vocabulary, join them into a single HTML document where for each term there would be an element with id corresponding to the term. After analysis, the document would be split based on these elements and process the results as usual. This would mean there would be just one call to the text analysis service and the results could be processed in a single transaction.
Example text analysis input document structure:

<html>
<body>
<div id="http://example.com/termOne">
Term one definition text
</div>
<div id="http://example.com/termTwo">
Term two definition
</div>
</body>
</html>

ledsoft added the performance Performance issue label Feb 4, 2025

ledsoft assigned lukaskabc Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of analysis of all vocabulary terms #327

Improve performance of analysis of all vocabulary terms #327

ledsoft commented Feb 4, 2025

ledsoft commented Feb 6, 2025

Improve performance of analysis of all vocabulary terms #327

Improve performance of analysis of all vocabulary terms #327

Comments

ledsoft commented Feb 4, 2025

ledsoft commented Feb 6, 2025