Skip to content

0.5.0

Compare
Choose a tag to compare
@jbaiter jbaiter released this 07 Oct 16:08
· 368 commits to main since this release

No breaking changes this time around, but a few essential bugfixes, more stability and a new feature.

API changes:

  • Snippets are now sorted by their descending score/relevancy. Previously the order was non-deterministic, which
    broke the use case for dynamically fetching more snippets.
  • Add a new boolean hl.ocr.alignSpans parameter to align text and image spans. This new option (disabled by
    default) ensures that the spans in text and image match, i.e. it forces the <em>...</em> in the highlighted text
    to correspond to actual OCR word boundaries.

Bugfixes:

  • Fix regular highlighting in distributed setup. Regular, non-OCR highlighting was broken in previous versions due
    to a bad check in the shard response collection phase if users only requested regular highlighting, but not for OCR
    fields
  • Highlight spans are now always consistent with the spans designated in text. Due to a bug, it would sometimes
    happen that the number of spans was inconsistent between the two.
  • Fix de-hyphenation in ALTO region texts. Previously only the complete snippet text would be de-hyphenated, but not
    the individual regions.
  • Fix post-match content detection in ALTO. A bug in this part of the code resulted in crashes when highlighting
    certain ALTO documents.