- Support OpenSearch 2.18.0
3.3.0 - 2024-11-13
allow_empty_morpheme
is added to thesudachi_tokenizer
settings (#151)- This allows morphemes to have an empty span (bool, default
false
)
- This allows morphemes to have an empty span (bool, default
- spi changed to implement #149
- New methods are added to
MorphemeAttribute
- New methods are added to
- Offset correction of
SudachiSplitFilter
now works properly with char filters (#149)
- Sopport latest elasticsearch / opensearch (#144)
- es: 8.14.3, 8.15.2, 7.17.24
- os: 2.15.0, 2.16.0, 2.17.1
- Use
lazyTokenizeSentences
for the analysis (#137)- This fixes the problem of input chunking (#131).
- Fix OOM error with a huge document (#132)
- Plugin now handles huge documents splitting into relatively small (1M char) chunks.
- Analysis may be broken around the edge of chunks (open issue, see #131)
- Add tutorial to use Sudachi synonym dictionary (#65)
- Update documents including tutrial (#125, #126)
- Explain with morpheme attribute (#121)
- Synonym filter and Sudachi filters can be used in any order (#122)
- Update deprecated codes (#125)
- MorphemeConsumerAttribute is removed (#127)
- This changes the interface of SPI. You can just remove MorphemeConsumerAttribute related code to migrate.
- Also see #123 and #124.
- Support ElasticSearch -8.13.4 and OpenSearch -2.14.0. (#114, #118)
- Integration tests (
:integration
) for es:8.9.0+ are moved to Github Actions.
- Integration tests (
- Fix dictionary caching problem (#112)
- support OpenSearch 2.6.0+ in addition to ElasticSearch
- analysis-sudachi plugin is now can be extended by other plugins. Loading sudachi plugins from extending plugins is supported as well
- Plugin is now implemented in Kotlin
- Added a new property
additional_settings
to write Sudachi settings directly in config - Added support for specifying Elasticsearch version at build time
- Fix duplicated tokens for OOVs with
sudachi_split
filter'sextended mode
- Upgrade Sudachi to 0.4.3
- Fix overrun with surrogate pairs
- Upgrade Sudachi to 0.4.2
- Fix buffer overrun with character normalization
- New mode
split_mode
was added - New filter
sudachi_split
was added instead ofmode
mode
was deperecated- Upgrade Sudachi morphological analyzer to 0.4.1
- Words containing periods are no longer split
- Fix a bug causing wrong offsets with
icu_normalizer
- Upgrade Sudachi morphological analyzer to 0.3.1
- Upgrade Sudachi morphological analyzer to 0.3.0
- Minor bug fix
- Upgrade Sudachi morphological analyzer to 0.2.0
- Import Sudachi from maven central repository
- Minor bug fix
- Upgrading Sudachi morphological analyzer to 0.2.0-SNAPSHOT
- New filter
sudachi_normalizedform
was added; see sudachi_normalizedform - Default normalization behavior was changed; neather baseform filter and normalziedform filter not applied
sudachi_readingform
filter was changed with new romaji mappings based on MS-IME
part-of-speech forward matching
is available onstoptags
; see sudachi_part_of_speech
- first release