中文分词器集合

一些中文分词器的简单封装和集合

Free software: MIT license
Documentation: https://tokenziers-collection.readthedocs.io.

Features

TODO

使用

from tokenizers_collection.config import tokenizer_registry
for name, tokenizer in tokenizer_registry:
    print("Tokenizer: {}".format(name))
    tokenizer('input_file.txt', 'output_file.txt')

安装

pip install tokenizers_collection

更新许可文件与下载模型

因为其中有些模型需要更新许可文件（比如：pynlpir）或者需要下载模型文件（比如：pyltp），因此安装后需要执行特定的命令完成操作，这里已经将所有的操作封装成了一个函数，只需要执行类似如下的指令即可

python -m tokenizers_collection.helper

注意：

如果遇到 Error: unable to fetch newest license. 那么可能是 Python 3 的 SSL 的问题，参考 pynlpir update error 或者 How to make Python use CA certificates from Mac OS TrustStore? 进行解决。
由于需要下载的模型文件较大（600+ M），所以下载时间较长，具体情况根据当时网络情况而定，如果遇到错误，尝试重新运行命令。

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
.idea		.idea
docs		docs
tests		tests
tokenizers_collection		tokenizers_collection
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
install_model_or_license.bash		install_model_or_license.bash
install_pyltp_under_macos.bash		install_pyltp_under_macos.bash
post_install_action.py		post_install_action.py
punch_config.py		punch_config.py
punch_version.py		punch_version.py
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

中文分词器集合

Features

使用

安装

更新许可文件与下载模型

Credits

About

Releases

Packages

Languages

License

howl-anderson/tokenizers_collection

Folders and files

Latest commit

History

Repository files navigation

中文分词器集合

Features

使用

安装

更新许可文件与下载模型

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages