-
-
Notifications
You must be signed in to change notification settings - Fork 92
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Menu: Allow editing of tagset mapping of spaCy's Catalan, Danish, Fre…
…nch, Greek (Modern), Macedonian, Norwegian (Bokmål), Portuguese, Russian, Spanish, and Ukrainian part-of-speech taggers
- Loading branch information
Showing
28 changed files
with
517 additions
and
93 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# ---------------------------------------------------------------------- | ||
# Wordless: Tagsets - Universal POS tags - Catalan | ||
# Copyright (C) 2018-2024 Ye Lei (叶磊) | ||
# | ||
# This program is free software: you can redistribute it and/or modify | ||
# it under the terms of the GNU General Public License as published by | ||
# the Free Software Foundation, either version 3 of the License, or | ||
# (at your option) any later version. | ||
# | ||
# This program is distributed in the hope that it will be useful, | ||
# but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
# GNU General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU General Public License | ||
# along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
# ---------------------------------------------------------------------- | ||
|
||
# Reference: https://universaldependencies.org/ca/pos/ | ||
tagset_mapping = [ | ||
['ADJ', 'ADJ', 'Adjective', 'gran, vell, verd, incomprensible\nprimer, segon, tercer'], | ||
['ADP', 'ADP', 'Adposition', '[English] in, to, during'], | ||
['ADV', 'ADV', 'Adverb', 'molt, bé, exactament, demà, dalt, baix\nInterrogative or exclamative adverbs: on, quan, com, per què\nDemonstrative adverbs: aquí, allí, ara, després\nTotality adverbs: sempre\nNegative adverbs: mai'], | ||
['AUX', 'AUX', 'Auxiliary', 'Tense auxiliaries: [English] has (done), is (doing), will (do)\nPassive auxiliaries: [English] was (done), got (done)\nModal auxiliaries: [English] should (do), must (do)\nVerbal copulas: [English] (He) is (a teacher.)\nAgreement auxiliaries: [K’iche’] la (2nd person singular formal), alaq (2nd person plural formal)'], | ||
['CONJ', 'CONJ', 'Coordinating/subordinating conjunction', 'See CCONJ and SCONJ'], | ||
['CCONJ', 'CCONJ', 'Coordinating conjunction', '[English] and, or, but'], | ||
['SCONJ', 'SCONJ', 'Subordinating conjunction', '[English] (I believe) that (he will come.), if, while'], | ||
['DET', 'DET', 'Determiner', 'Articles (a closed class indicating definiteness, specificity or givenness): [English] a, an, the\nPossessive determiners (which modify a nominal; note that some languages use PRON for similar words): [Czech] můj, tvůj, jeho, její, náš, váš, jejich\nDemonstrative determiners: [English] (I saw) this (car yesterday.)\nInterrogative determiners: [English] Which (car do you like?)\nRelative determiners: [English] (I wonder) which (car you like.)\nQuantity determiners (quantifiers):\n\tIndefinite: [English] any\n\tUniversal: [English] all\n\tNegative: [English] (We have) no (cars available.)'], | ||
['INTJ', 'INTJ', 'Interjection', 'psst, ai, bravo, hola, Sí(, perque…), No(, no ho crec.)'], | ||
['NOUN', 'NOUN', 'Noun', 'noia, gat, arbre, aire, bellesa'], | ||
['PROPN', 'PROPN', 'Proper noun', '[English] Mary, John, London, NATO, HBO, [email protected], http://universaldependencies.org/, 1-800-COMPANY'], | ||
['NUM', 'NUM', 'Numeral', '0, 1, 2, 3, 4, 5, 2014, 1000000, 3.14159265359\n11/11/1918, 11:00\n[English] one, two, three, seventy-seven\nk (abbreviation for thousand), m (abbreviation for million)\nI, II, III, IV, V, MMXIV'], | ||
['PART', 'PART', 'Particle', 'Possessive marker: [English] ’s\nNegation particle: [English] not; [German] nicht\nQuestion particle: [Japanese] か/ka (adding this particle to the end of a clause turns the clause into a question); [Turkish] mu\nSentence modality: [Czech] ať, kéž, nechť'], | ||
['PRON', 'PRON', 'Pronoun', 'Personal pronouns: [English] I, you, he, she, it, we, they\nReflexive pronouns: [English] myself, yourself, himself, herself, itself, ourselves, yourselves, theirselves\nInterrogative pronouns: who, What (do you think?)\nRelative pronouns (unlike SCONJ relativizers, relative pronouns play a nominal role in the relative clause): [English] (a cat) who (eats fish), that, which, (I wonder) what (you think.)\nIndefinite pronouns: [English] somebody, something, anybody, anything\nTotal pronouns: [English] everybody, everything\nNegative pronouns: [English] nobody, nothing\nPossessive pronouns (which usually stand alone as a nominal): [English] mine, yours, his, hers, its, ours, theirs\nAttributive possessive pronouns (in some languages; others use DET for similar words): [English] my, your'], | ||
['VERB', 'VERB', 'Verb', '[English] run, eat\n[English] runs, ate\n[English] running, eating'], | ||
|
||
['PUNCT', 'PUNCT', 'Punctuation', 'Period: .\nComma: ,\nParentheses: ()'], | ||
['SYM', 'SYM', 'Symbol', '$, %, §, ©\n+, −, ×, ÷, =, <, >\n:), ♥‿♥, 😝'], | ||
['X', 'X', 'Other', '[English] (And then he just) xfgh pdl jklw'] | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# ---------------------------------------------------------------------- | ||
# Wordless: Tagsets - Universal POS tags - Danish | ||
# Copyright (C) 2018-2024 Ye Lei (叶磊) | ||
# | ||
# This program is free software: you can redistribute it and/or modify | ||
# it under the terms of the GNU General Public License as published by | ||
# the Free Software Foundation, either version 3 of the License, or | ||
# (at your option) any later version. | ||
# | ||
# This program is distributed in the hope that it will be useful, | ||
# but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
# GNU General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU General Public License | ||
# along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
# ---------------------------------------------------------------------- | ||
|
||
# Reference: https://universaldependencies.org/da/pos/ | ||
tagset_mapping = [ | ||
['ADJ', 'ADJ', 'Adjective', 'gammel/gammelt/gamle, grøn/grønt/grønne, ufatlig/ufatligt/ufatlige'], | ||
['ADP', 'ADP', 'Adposition', 'i, på, gennem'], | ||
['ADV', 'ADV', 'Adverb', 'meget (vigtigt), væk, (jeg spiser) ikke (rejer), pludselig'], | ||
['AUX', 'AUX', 'Auxiliary', 'Tense auxiliary: har (købt)\nModal auxiliary: kunne (tænke)\nPassive auxiliary: blev (fundet)\nCopula: var (grøn), er (en løsning)'], | ||
['CONJ', 'CONJ', 'Coordinating/subordinating conjunction', 'See CCONJ and SCONJ'], | ||
['CCONJ', 'CCONJ', 'Coordinating conjunction', 'og, eller, men'], | ||
['SCONJ', 'SCONJ', 'Subordinating conjunction', 'da, hvis, (konstatere) at (manden har søgt hjælp)'], | ||
['DET', 'DET', 'Determiner', 'Articles: en, et, den, det, de\nPossessive determiners: min (bil), deres (holdninger), dit (job)\nNegative determiners: (han har) ingen (empati)'], | ||
['INTJ', 'INTJ', 'Interjection', 'Hmm!, Åh!, Hej!'], | ||
['NOUN', 'NOUN', 'Noun', 'pige, kat, træ, luft, skønhed'], | ||
['PROPN', 'PROPN', 'Proper noun', 'Anna, Otto\nSkåne, USA\nTexaco, Pirelli'], | ||
['NUM', 'NUM', 'Numeral', '0, 1, 2, 3, 2014, 1 000 000, 3.000,15, 3,14159265359\net, to, tre, nitten\nI, II, III, IV, V, MMXIV'], | ||
['PART', 'PART', 'Particle', '(det er muligt) at (ændre det)'], | ||
['PRON', 'PRON', 'Pronoun', 'Personal (subject) pronouns: jeg, du, han, hun, det/den, vi, I, de\nPlaceholder personal pronoun: man (kan gå)\nPersonal (object)/reflexive pronouns: mig, dig, ham, henne, sig, os, hinanden\nDemonstrative pronouns: dette (er et svært spørgsmål)\nPossessive pronouns: vores\nInterrogative pronouns: hvad\nRelative pronouns: hvis\nIndefinite pronouns: nogen, noget\nTotality pronouns: alting\nNegative pronouns: ingen (af os)'], | ||
['VERB', 'VERB', 'Verb', 'at vise, jeg viser, han viste\nat flyve, vi flyver, de fløj'], | ||
|
||
['PUNCT', 'PUNCT', 'Punctuation', 'Period: .\nComma: ,\nParentheses: ()'], | ||
['SYM', 'SYM', 'Symbol', '§'], | ||
['X', 'X', 'Other', 'musik(- og billedprogrammer)'] | ||
] |
Oops, something went wrong.