Skip to content

Commit

Permalink
Dependencies: Upgrade spaCy to 3.7.1; Utils: Fix downloading of Stanz…
Browse files Browse the repository at this point in the history
…a models
  • Loading branch information
BLKSerene committed Oct 2, 2023
1 parent 436a0c2 commit f6e5a71
Show file tree
Hide file tree
Showing 35 changed files with 198 additions and 204 deletions.
2 changes: 1 addition & 1 deletion ACKNOWLEDGMENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ As Wordless stands on the shoulders of giants, I hereby extend my sincere gratit
24|[Sacremoses](https://github.com/alvations/sacremoses) |0.0.53|Liling Tan|[MIT](https://github.com/alvations/sacremoses/blob/master/LICENSE)
25|[SciPy](https://scipy.org/scipylib/) |1.11.3|SciPy Developers|[BSD-3-Clause](https://github.com/scipy/scipy/blob/main/LICENSE.txt)
26|[simplemma](https://github.com/adbar/simplemma) |0.9.1 |Adrien Barbaresi|[MIT](https://github.com/adbar/simplemma/blob/main/LICENSE)
27|[spaCy](https://spacy.io/) |3.6.1 |Matthew Honnibal, Ines Montani, Sofie Van Landeghem,<br>Adriane Boyd, Paul O'Leary McCann|[MIT](https://github.com/explosion/spaCy/blob/master/LICENSE)
27|[spaCy](https://spacy.io/) |3.7.1 |Matthew Honnibal, Ines Montani, Sofie Van Landeghem,<br>Adriane Boyd, Paul O'Leary McCann|[MIT](https://github.com/explosion/spaCy/blob/master/LICENSE)
28|[spacy-pkuseg](https://github.com/explosion/spacy-pkuseg) |0.0.32|Ruixuan Luo (罗睿轩), Jingjing Xu (许晶晶),<br>Xuancheng Ren (任宣丞), Yi Zhang (张艺),<br>Zhiyuan Zhang (张之远), Bingzhen Wei (位冰镇),<br>Xu Sun (孙栩)<br>Adriane Boyd, Ines Montani|[MIT](https://github.com/explosion/spacy-pkuseg/blob/master/LICENSE)
29|[Stanza](https://github.com/stanfordnlp/stanza) |1.5.1 |Peng Qi (齐鹏), Yuhao Zhang (张宇浩),<br>Yuhui Zhang (张钰晖), Jason Bolton,<br>Tim Dozat, John Bauer|[Apache-2.0](https://github.com/stanfordnlp/stanza/blob/main/LICENSE)
30|[SudachiPy](https://github.com/WorksApplications/sudachi.rs) |0.6.7 |Works Applications Co., Ltd.|[Apache-2.0](https://github.com/WorksApplications/sudachi.rs/blob/develop/LICENSE)
Expand Down
2 changes: 1 addition & 1 deletion ACKNOWLEDGMENTS_zho_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
24|[Sacremoses](https://github.com/alvations/sacremoses) |0.0.53|Liling Tan|[MIT](https://github.com/alvations/sacremoses/blob/master/LICENSE)
25|[SciPy](https://scipy.org/scipylib/) |1.11.3|SciPy 开发人员|[BSD-3-Clause](https://github.com/scipy/scipy/blob/main/LICENSE.txt)
26|[simplemma](https://github.com/adbar/simplemma) |0.9.1 |Adrien Barbaresi|[MIT](https://github.com/adbar/simplemma/blob/main/LICENSE)
27|[spaCy](https://spacy.io/) |3.6.1 |Matthew Honnibal, Ines Montani, Sofie Van Landeghem,<br>Adriane Boyd, Paul O'Leary McCann|[MIT](https://github.com/explosion/spaCy/blob/master/LICENSE)
27|[spaCy](https://spacy.io/) |3.7.1 |Matthew Honnibal, Ines Montani, Sofie Van Landeghem,<br>Adriane Boyd, Paul O'Leary McCann|[MIT](https://github.com/explosion/spaCy/blob/master/LICENSE)
28|[spacy-pkuseg](https://github.com/explosion/spacy-pkuseg) |0.0.32|罗睿轩, 许晶晶, 任宣丞, 张艺, 张之远, 位冰镇, 孙栩<br>Adriane Boyd, Ines Montani|[MIT](https://github.com/explosion/spacy-pkuseg/blob/master/LICENSE)
29|[Stanza](https://github.com/stanfordnlp/stanza) |1.5.1 |齐鹏, 张宇浩, 张钰晖,<br>Jason Bolton, Tim Dozat, John Bauer|[Apache-2.0](https://github.com/stanfordnlp/stanza/blob/main/LICENSE)
30|[SudachiPy](https://github.com/WorksApplications/sudachi.rs) |0.6.7 |Works Applications Co., Ltd.|[Apache-2.0](https://github.com/WorksApplications/sudachi.rs/blob/develop/LICENSE)
Expand Down
2 changes: 1 addition & 1 deletion ACKNOWLEDGMENTS_zho_tw.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@
24|[Sacremoses](https://github.com/alvations/sacremoses) |0.0.53|Liling Tan|[MIT](https://github.com/alvations/sacremoses/blob/master/LICENSE)
25|[SciPy](https://scipy.org/scipylib/) |1.11.3|SciPy 開發人員|[BSD-3-Clause](https://github.com/scipy/scipy/blob/main/LICENSE.txt)
26|[simplemma](https://github.com/adbar/simplemma) |0.9.1 |Adrien Barbaresi|[MIT](https://github.com/adbar/simplemma/blob/main/LICENSE)
27|[spaCy](https://spacy.io/) |3.6.1 |Matthew Honnibal, Ines Montani, Sofie Van Landeghem,<br>Adriane Boyd, Paul O'Leary McCann|[MIT](https://github.com/explosion/spaCy/blob/master/LICENSE)
27|[spaCy](https://spacy.io/) |3.7.1 |Matthew Honnibal, Ines Montani, Sofie Van Landeghem,<br>Adriane Boyd, Paul O'Leary McCann|[MIT](https://github.com/explosion/spaCy/blob/master/LICENSE)
28|[spacy-pkuseg](https://github.com/explosion/spacy-pkuseg) |0.0.32|羅睿軒, 許晶晶, 任宣丞, 張藝, 張之遠, 位冰鎮, 孫栩<br>Adriane Boyd, Ines Montani|[MIT](https://github.com/explosion/spacy-pkuseg/blob/master/LICENSE)
29|[Stanza](https://github.com/stanfordnlp/stanza) |1.5.1 |齊鵬, 張宇浩, 張鈺暉,<br>Jason Bolton, Tim Dozat, John Bauer|[Apache-2.0](https://github.com/stanfordnlp/stanza/blob/main/LICENSE)
30|[SudachiPy](https://github.com/WorksApplications/sudachi.rs) |0.6.7 |Works Applications Co., Ltd.|[Apache-2.0](https://github.com/WorksApplications/sudachi.rs/blob/develop/LICENSE)
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@

<div align="center"><h1>📄 Changelog</h1></div>

## [3.4.1](https://github.com/BLKSerene/Wordless/releases/tag/3.4.1) - ??/??/2023
### 📌 Bugfixes
- Utils: Fix downloading of Stanza models

### ⏫ Dependency Changes
- Dependencies: Upgrade spaCy to 3.7.1

## [3.4.0](https://github.com/BLKSerene/Wordless/releases/tag/3.4.0) - 09/30/2023
### 🎉 New Features
- Settings: Add Settings - Measures - Lexical Diversity
Expand Down
5 changes: 3 additions & 2 deletions requirements/requirements_dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ pymorphy3-dicts-ru
pymorphy3-dicts-uk

## spaCy
spacy[lookups,transformers]
spacy
spacy-lookups-data
spacy-pkuseg

## SudachiPy
Expand Down Expand Up @@ -69,4 +70,4 @@ wordcloud
torch

# spaCy models
en_core_web_trf-3.6.1-py3-none-any.whl
en_core_web_trf-3.7.2-py3-none-any.whl
2 changes: 1 addition & 1 deletion requirements/requirements_tests.txt
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ pymorphy3-dicts-ru == 2.4.417150.4580142
pymorphy3-dicts-uk == 2.4.1.1.1663094765

## spaCy
spacy == 3.6.1
spacy == 3.7.1
spacy-lookups-data == 1.0.5
spacy-pkuseg == 0.0.32

Expand Down
10 changes: 5 additions & 5 deletions tests/tests_file_area/test_file_area_file_types.py

Large diffs are not rendered by default.

14 changes: 7 additions & 7 deletions tests/tests_nlp/test_lemmatization.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,9 +141,9 @@ def test_lemmatize(lang, lemmatizer):
assert lemmas == ['Forrþrihht', 'anan', 'see', 'timen', 'comm', 'þatt', 'eure', 'Drihhtin', 'wollde', 'been', 'borenn', 'in', 'þiss', 'middellærd', 'forr', 'all', 'mannkinne', 'neden', 'hem', 'chæs', 'him', 'sonne', 'kinnessmenn', 'all', 'swillke', 'summ', 'hem', 'wollde', 'and', 'whær', 'hem', 'wollde', 'borenn', 'been', 'hem', 'chæs', 'all', 'att', 'his', 'willen', '.']
elif lang.startswith('eng_'):
if lemmatizer == 'nltk_wordnet':
assert lemmas == ['English', 'be', 'a', 'West', 'Germanic', 'language', 'in', 'the', 'Indo-European', 'language', 'family', 'that', 'originate', 'in', 'early', 'medieval', 'England.', '[', '3', ']', '[', '4', ']', '[', '5', ']']
assert lemmas == ['English', 'be', 'a', 'West', 'Germanic', 'language', 'in', 'the', 'Indo-European', 'language', 'family', '.']
elif lemmatizer == 'simplemma_eng':
assert lemmas == ['English', 'be', 'a', 'west', 'germanic', 'language', 'in', 'the', 'Indo-European', 'language', 'family', 'that', 'originate', 'in', 'early', 'medieval', 'England.', '[', '3', ']', '[', '4', ']', '[', '5', ']']
assert lemmas == ['English', 'be', 'a', 'west', 'germanic', 'language', 'in', 'the', 'Indo-European', 'language', 'family', '.']
else:
tests_lang_util_skipped = True
elif lang == 'est':
Expand Down Expand Up @@ -188,9 +188,9 @@ def test_lemmatize(lang, lemmatizer):
else:
tests_lang_util_skipped = True
elif lang == 'ita':
assert lemmas == ["L'italiano", '(', '[', 'itaˈljaːno', ']', '[', 'nota', '1', ']', 'ascoltare', '[', '?', '·info', ']', ')', 'essere', 'uno', 'lingua', 'romanza', 'parlato', 'principalmente', 'in', 'Italia', '.']
assert lemmas == ["L'italiano", '(', '[', 'itaˈljaːno', ']', '[', 'nota', '1', ']', 'ascoltaⓘ', ')', 'essere', 'uno', 'lingua', 'romanza', 'parlato', 'principalmente', 'in', 'Italia', '.']
elif lang == 'jpn':
assert lemmas == ['日本語', '(', 'にほん', 'ご', '、', 'にっぽん', 'ご', '[', '注釈', '2', ']', '、', '英語', ':', 'Japanese', 'language', ')', 'は', '、', '日本', '国', '内', 'や', '、', 'かつて', 'の', '日本', '領', 'だ', 'た', '国', '、', 'そして', '国外', '移民', 'や', '移住者', 'を', '含む', '日本人', '同士', 'の', '間', 'で', '使用', 'する', 'れる', 'て', 'いる', '言語', '。']
assert lemmas == ['日本語', '(', 'にほん', 'ご', '、', 'にっぽん', 'ご', '[', '注釈', '2', ']', ')', 'は', '、', '日本', '国', '内', 'や', '、', 'かつて', 'の', '日本', '領', 'だ', 'た', '国', '、', 'そして', '国外', '移民', 'や', '移住者', 'を', '含む', '日本人', '同士', 'の', '間', 'で', '使用', 'する', 'れる', 'て', 'いる', '言語', '。']
elif lang == 'kor':
assert lemmas == ['한국어', '(', '韓國語', ')', '는', '대한민+국과', '조선민주주의인민공화국+의', '공용어이다', '.']
elif lang == 'lat':
Expand All @@ -213,7 +213,7 @@ def test_lemmatize(lang, lemmatizer):
elif lang == 'glv':
assert lemmas == ['She', 'Gaelg', '(', 'graït', ':', '/gɪlg/', ')', 'çhengey', 'Gaelagh', 'Mannin', '.']
elif lang == 'nob':
assert lemmas == ['bokmål', 'være', 'enn', 'varietet', 'av', 'norsk', 'skriftspråk', '.']
assert lemmas == ['bokmål', 'være', 'enn', 'av', 'to', 'offisiell', 'målform', 'av', 'norsk', 'skriftspråk', ',', 'hvorav', 'den', 'annen', 'være', 'nynorsk', '.']
elif lang == 'nno':
assert lemmas == ['nynorsk', ',', 'føra', '1929', 'offisiell', 'kall', 'landsmål', ',', 'vera', 'sidan', 'jamstillingsvedtaket', 'av', '12', '.', 'mai', '1885', 'ein', 'av', 'den', 'to', 'offisiell', 'målformene', 'av', 'norsk', ';', 'den', 'annan', 'forme', 'vera', 'bokmål', '.']
elif lang == 'fas':
Expand All @@ -231,9 +231,9 @@ def test_lemmatize(lang, lemmatizer):
assert lemmas == ['limbă', 'român', 'fi', 'el', 'limbă', 'indo-european', 'din', 'grup', 'italic', 'și', 'din', 'subgrupul', 'oriental', 'al', 'limbă', 'romanice', '.']
elif lang == 'rus':
if lemmatizer == 'simplemma_rus':
assert lemmas == ['Ру́сский', 'язы́к', '(', '[', 'ˈruskʲɪi̯', 'jɪˈzɨk', ']', 'информация', 'о', 'файл', 'слушать', ')', '[', '~', '3', ']', '[', '⇨', ']', '—', 'язык', 'восточнославянский', 'группа', 'славянский', 'ветвь', 'индоевропейский', 'языковый', 'семья', ',', 'национальный', 'язык', 'русский', 'народ', '.']
assert lemmas == ['Ру́сский', 'язы́к', '(', 'МФА', ':', '[', 'ˈruskʲɪi̯', 'jɪˈzɨk', ']', '', ')', '[', '~', '3', ']', '[', '⇨', ']', '—', 'язык', 'восточнославянский', 'группа', 'славянский', 'ветвь', 'индоевропейский', 'языковый', 'семья', ',', 'национальный', 'язык', 'русский', 'народ', '.']
elif lemmatizer == 'pymorphy3_morphological_analyzer':
assert lemmas == ['ру́сский', 'язы́к', '(', '[', 'ˈruskʲɪi̯', 'jɪˈzɨk', ']', 'информация', 'о', 'файл', 'слушать', ')', '[', '~', '3', ']', '[', '⇨', ']', '—', 'язык', 'восточнославянский', 'группа', 'славянский', 'ветвь', 'индоевропейский', 'языковой', 'семья', ',', 'национальный', 'язык', 'русский', 'народ', '.']
assert lemmas == ['ру́сский', 'язы́к', '(', 'мфа', ':', '[', 'ˈruskʲɪi̯', 'jɪˈzɨk', ']', '', ')', '[', '~', '3', ']', '[', '⇨', ']', '—', 'язык', 'восточнославянский', 'группа', 'славянский', 'ветвь', 'индоевропейский', 'языковой', 'семья', ',', 'национальный', 'язык', 'русский', 'народ', '.']
else:
tests_lang_util_skipped = True
elif lang == 'sme':
Expand Down
14 changes: 7 additions & 7 deletions tests/tests_nlp/test_matching.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,29 +222,29 @@ def test_match_tokens():
assert wl_matching.match_tokens(
main,
search_terms = ['take'],
tokens = ['take', 'takes', 'taked', 'taken', 'taking', 'test'],
tokens = ['take', 'takes', 'took', 'taken', 'taking', 'test'],
lang = 'eng_us',
tagged = False,
settings = init_search_settings(match_whole_words = True)
) == set(['take'])

assert wl_matching.match_tokens(
main,
search_terms = ['taked'],
tokens = ['take', 'takes', 'taked', 'taken', 'taking', 'test'],
search_terms = ['takes'],
tokens = ['take', 'takes', 'took', 'taken', 'taking', 'test'],
lang = 'eng_us',
tagged = False,
settings = init_search_settings(match_inflected_forms = True)
) == set(['take', 'takes', 'taked', 'taken', 'taking'])
) == set(['take', 'takes', 'taken'])

assert wl_matching.match_tokens(
main,
search_terms = ['take[dn]'],
tokens = ['take', 'takes', 'taked', 'taken', 'taking', 'test'],
search_terms = ['take[sn]'],
tokens = ['take', 'takes', 'took', 'taken', 'taking', 'test'],
lang = 'eng_us',
tagged = False,
settings = init_search_settings(use_regex = True)
) == set(['taked', 'taken'])
) == set(['takes', 'taken'])

assert wl_matching.match_tokens(
main,
Expand Down
Loading

0 comments on commit f6e5a71

Please sign in to comment.