Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
mh-northlander committed Nov 6, 2024
1 parent d59f84b commit 6e45504
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,8 @@ The `sudachi_tokenizer` tokenizer tokenizes input texts using Sudachi.
- A: The shortest units equivalent to the UniDic short unit
- Ex) 選挙,管理,委員,会
- discard\_punctuation: Select to discard punctuation or not. (bool, default: true)
- allow\_empty\_morpheme: Allow output morpheme to have an empty span. (bool, default: false)
- This happens when an input text contains a composite character (e.g. ㍿) and it is split into morphemes. If false (default), all split morphemes will contain the span of the character. If true, only the first morpheme will contain the span and the span of other morphemes can be empty.
- settings\_path: Sudachi setting file path. The path may be absolute or relative; relative paths are resolved with respect to es\_config. (string, default: null)
- resources\_path: Sudachi dictionary path. The path may be absolute or relative; relative paths are resolved with respect to es\_config. (string, default: null)
- additional_settings: Describes a configuration JSON string for Sudachi. This JSON string will be merged into the default configuration. If this property is set, `settings_path` will be overridden.
Expand Down

0 comments on commit 6e45504

Please sign in to comment.