Release 0.5.0 · dbmdz/solr-ocrhighlighting

No breaking changes this time around, but a few essential bugfixes, more stability and a new feature.

API changes:

Snippets are now sorted by their descending score/relevancy. Previously the order was non-deterministic, which
broke the use case for dynamically fetching more snippets.
Add a new boolean hl.ocr.alignSpans parameter to align text and image spans. This new option (disabled by
default) ensures that the spans in text and image match, i.e. it forces the <em>...</em> in the highlighted text
to correspond to actual OCR word boundaries.

Bugfixes:

Fix regular highlighting in distributed setup. Regular, non-OCR highlighting was broken in previous versions due
to a bad check in the shard response collection phase if users only requested regular highlighting, but not for OCR
fields
Highlight spans are now always consistent with the spans designated in text. Due to a bug, it would sometimes
happen that the number of spans was inconsistent between the two.
Fix de-hyphenation in ALTO region texts. Previously only the complete snippet text would be de-hyphenated, but not
the individual regions.
Fix post-match content detection in ALTO. A bug in this part of the code resulted in crashes when highlighting
certain ALTO documents.

Provide feedback