diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..95f2be6 --- /dev/null +++ b/.gitignore @@ -0,0 +1,3 @@ +*.tar +*.csv +*.logs diff --git a/README.md b/README.md index e88a1fe..3926ae9 100755 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ - \[DocLing\]: Gleßgen, Martin Dietrich (dir.), et al., _Les plus anciens documents linguistiques de la France_, 2016, [http://www.rose.uzh.ch/docling/](http://www.rose.uzh.ch/docling/), 3e édition. - \[Geste\]: Camps, Jean-Baptiste (dir.), _Geste: un corpus de chansons de geste_, 2016-… (v02), École nationale des chartes, Paris, 2019, [http://doi.org/10.5281/zenodo.2630574](http://doi.org/10.5281/zenodo.2630574), textes du domaine public, développements CC-BY-SA. - \[Lancelot\]: Ing, Lucence, _Disparitions lexicales en diachronie: traitements automatiques sur le Lancelot en prose_, thèse de doct. en préparation, dir. F. Duval, codir. J.B. Camps, École nationale des chartes, Université PSL, Paris. -- \[WauchierSConf\] Pinche, Ariane, _Édition nativement numérique du recueil hagiographique ‘Li Seint Confessor’ de Wauchier de Denain d’après le manuscrit fr. 412 de la Bibliothèque nationale de France_, thèse de doctorat dir. C. pierreville et B. Bureau, Université de Lyon, Lyon, 2021. +- \[WauchierSConf\] Pinche, Ariane, _Édition nativement numérique du recueil hagiographique ‘Li Seint Confessor’ de Wauchier de Denain d’après le manuscrit fr. 412 de la Bibliothèque nationale de France_, thèse de doctorat dir. C. pierreville et B. Bureau, Université de Lyon, Lyon, 2021. The \[Varia\] are composed of short excerpts, taken from the work of students at the École des chartes, annotated in 2020, as part of the evaluation of the course _initiation à la philologie romane: introduction au moyen français_, given by Lucence Ing and Jean-Baptiste Camps (thematic dossier on the plague and medicine, during the first lockdown of 2020 of the COVID19 pandemic) @@ -25,3 +25,165 @@ From the ed. by Nicaise, Edouard (1890) p. 167 ff - Poésies de Gilles li Muisis, published for the first time, according to the manuscript of Lord Ashburnham by baron Kervyn de Lettenhove, Louvain, 1882, https://archive.org/details/posiesdegilles01lemuuoft/page/78/mode/2up, +## Statistics (2023-04-26) + + +### Token, Lemma and POS counts + +| Category | Different | Total | Values with 1 occurrence only | +|------------|-------------|-----------|---------------------------------| +| Forms | 47,661 | 1,183,960 | 23,851 | +| Lemma | 11,295 | 1,183,960 | 3,852 | +| POS | 66 | 1,183,960 | 6 | + +### Morphology counts + +*Non-x* values means that the category actually applied to the token: a verb will have a DEGRE annotation of x, because verb can't have DEGRE. + +| Category | Different | Total | Non-x values | +|------------|-------------|---------|----------------| +| Mode | 6 | 478,657 | 60,740 | +| Temps | 5 | 478,657 | 57,367 | +| Personne | 5 | 478,657 | 106,566 | +| Nombre | 3 | 478,657 | 290,326 | +| Genre | 4 | 478,657 | 226,996 | +| Cas | 4 | 478,657 | 229,586 | +| Degre | 5 | 478,657 | 42,949 | + +### POS + +| Value | Count | +|---------------|---------| +| NOMcom | 160,410 | +| VERcjg | 156,630 | +| PROper | 96,533 | +| PRE | 91,586 | +| PONfbl | 79,784 | +| ADVgen | 79,578 | +| CONcoo | 66,658 | +| DETdef | 57,655 | +| PONfrt | 42,489 | +| CONsub | 40,120 | +| VERppe | 35,647 | +| ADJqua | 31,675 | +| VERinf | 28,218 | +| NOMpro | 27,872 | +| ADVneg | 25,947 | +| PROrel | 25,542 | +| DETpos | 22,367 | +| PROadv | 15,003 | +| PRE.DETdef | 14,836 | +| PROdem | 14,327 | +| PROind | 11,661 | +| DETind | 10,985 | +| PONpga | 7,707 | +| DETndf | 7,076 | +| DETdem | 6,057 | +| PONpdr | 4,842 | +| DETcar | 3,229 | +| VERppa | 2,784 | +| ADJind | 2,575 | +| PROimp | 2,036 | +| PROcar | 1,855 | +| ADJcar | 1,277 | +| ADJpos | 1,049 | +| PROint | 1,014 | +| PONpxx | 1,012 | +| ADVneg.PROper | 952 | +| PROpos | 669 | +| ADJord | 636 | +| ADVsub | 592 | +| INJ | 549 | +| ADVint | 506 | +| DETrel | 448 | +| PROord | 327 | +| PROper.PROper | 311 | +| ADVgen.PROper | 271 | +| DETint | 225 | +| PRE.PROdem | 151 | +| DETcom | 52 | +| PRE.PROper | 47 | +| PROrel.PROper | 46 | +| RED | 34 | +| ETR | 33 | +| CONsub.PROper | 18 | +| ADVgen.CONsub | 16 | +| PRE.DETcom | 12 | +| DETord | 8 | +| ADJqua.NOMcom | 7 | +| PRE.PROrel | 4 | +| ADVing | 2 | +| ADVneg.PROadv | 2 | +| PROint.PROper | 1 | +| CONsubs | 1 | +| ADVgen.PROadv | 1 | +| NomPro | 1 | +| PRE.DETrel | 1 | +| CONsub.DETdef | 1 | + +### Mode + +| Value | Count | +|-----------|---------| +| MODE=x | 417,917 | +| MODE=ind | 51,951 | +| MODE=sub | 5,416 | +| MODE=imp | 2,061 | +| MODE=con | 1,311 | +| MODE=cond | 1 | + +### Temps + +| Value | Count | +|-----------|---------| +| TEMPS=x | 421,290 | +| TEMPS=pst | 29,150 | +| TEMPS=psp | 14,882 | +| TEMPS=ipf | 9,012 | +| TEMPS=fut | 4,323 | + +### Personne + +| Value | Count | +|---------|---------| +| PERS.=x | 372,091 | +| PERS.=3 | 76,497 | +| PERS.=1 | 18,377 | +| PERS.=2 | 11,455 | +| PERS.=0 | 237 | + +### Nombre + +| Value | Count | +|---------|---------| +| NOMB.=s | 218,952 | +| NOMB.=x | 188,331 | +| NOMB.=p | 71,374 | + +### Genre + +| Value | Count | +|---------|---------| +| GENRE=x | 251,661 | +| GENRE=m | 155,955 | +| GENRE=f | 63,962 | +| GENRE=n | 7,079 | + +### Cas + +| Value | Count | +|---------|---------| +| CAS=x | 249,071 | +| CAS=r | 145,693 | +| CAS=n | 75,652 | +| CAS=i | 8,241 | + +### Degre + +| Value | Count | +|---------|---------| +| DEGRE=x | 435,708 | +| DEGRE=- | 24,947 | +| DEGRE=p | 16,622 | +| DEGRE=c | 910 | +| DEGRE=s | 470 | diff --git a/tooling/.gitignore b/tooling/.gitignore new file mode 100644 index 0000000..f0191b2 --- /dev/null +++ b/tooling/.gitignore @@ -0,0 +1,3 @@ +env +output-* +*memory.csv \ No newline at end of file diff --git a/tooling/00-install.sh b/tooling/00-install.sh new file mode 100644 index 0000000..90bf90c --- /dev/null +++ b/tooling/00-install.sh @@ -0,0 +1,2 @@ +virtualenv env -p python3 +env/bin/pip install -r requirements.txt \ No newline at end of file diff --git a/tooling/01-build.sh b/tooling/01-build.sh new file mode 100644 index 0000000..31b65fd --- /dev/null +++ b/tooling/01-build.sh @@ -0,0 +1,5 @@ +rm -r output-* +env/bin/protogenie build config-lemma-pos.xml --output output-lemma-pos -t .98 -d .02 -e 0 --verbose +env/bin/protogenie concat config-lemma-pos.xml output-lemma-pos +env/bin/protogenie build config-morph.xml --output output-morph -t .98 -d .02 -e 0 --verbose +env/bin/protogenie concat config-morph.xml output-morph \ No newline at end of file diff --git a/tooling/02-build-test.sh b/tooling/02-build-test.sh new file mode 100644 index 0000000..80bedfd --- /dev/null +++ b/tooling/02-build-test.sh @@ -0,0 +1,2 @@ +rm -r output-test +env/bin/protogenie build config-test.xml --output output-test -n --verbose \ No newline at end of file diff --git a/tooling/config-lemma-pos.xml b/tooling/config-lemma-pos.xml new file mode 100644 index 0000000..c4f6357 --- /dev/null +++ b/tooling/config-lemma-pos.xml @@ -0,0 +1,44 @@ + + + + +
+ form + lemma + POS +
+
+ + +
+ token + lemma + POS +
+
+ + + + + + lemma + token + + + + + lemma + token + + + + + lemma + token + + + + + +
diff --git a/tooling/config-morph.xml b/tooling/config-morph.xml new file mode 100644 index 0000000..0406fe3 --- /dev/null +++ b/tooling/config-morph.xml @@ -0,0 +1,60 @@ + + + + +
+ form + lemma + POS + morph +
+
+ + +
+ token + lemma + POS + MODE + TEMPS + PERS + NOMB + GENRE + CAS + DEGRE + SPEC +
+
+ + + + + + lemma + token + + + + + lemma + token + + + + + lemma + token + + + + + + + + + + + + +
\ No newline at end of file diff --git a/tooling/config-test.xml b/tooling/config-test.xml new file mode 100644 index 0000000..ca1be12 --- /dev/null +++ b/tooling/config-test.xml @@ -0,0 +1,68 @@ + + + + +
+ form + lemma + POS + morph +
+
+ + +
+ token + lemma + POS + MODE + TEMPS + PERS + NOMB + GENRE + CAS + DEGRE + SPEC +
+
+ + + + + + lemma + token + + + + + lemma + token + + + + + lemma + token + + + + + + + + + + + + + + + +
+ + + diff --git a/tooling/corpora/with-morph.xml b/tooling/corpora/with-morph.xml new file mode 100644 index 0000000..00f6240 --- /dev/null +++ b/tooling/corpora/with-morph.xml @@ -0,0 +1,13 @@ + + + + +
+ + + +
+ + \ No newline at end of file diff --git a/tooling/corpora/without-morph.xml b/tooling/corpora/without-morph.xml new file mode 100644 index 0000000..41ad1eb --- /dev/null +++ b/tooling/corpora/without-morph.xml @@ -0,0 +1,9 @@ + + + + +
+ + diff --git a/tooling/get-stats.py b/tooling/get-stats.py new file mode 100644 index 0000000..8594703 --- /dev/null +++ b/tooling/get-stats.py @@ -0,0 +1,94 @@ +# Create stats for README.md +import glob +import tabulate +from typing import Iterable, Dict + +print("\n## Token, Lemma and POS counts\n") + + +def read_csv(filepath: str) -> Iterable[Dict[str, str]]: + with open(filepath) as f: + for idx, line in enumerate(f.readlines()): + if idx == 0: + header = line.strip().split("\t") + continue + elif not line.strip(): + continue + line = dict(zip(header, line.strip().split("\t"))) + yield line + return None + + +from collections import Counter, defaultdict + +stats = defaultdict(Counter) + +for file in glob.glob("output-lemma-pos/*.tsv"): + for line in read_csv(file): + if not line: + continue + for key, value in line.items(): + stats[key][value] += 1 + +count_table = [ + ["Category", "Different", "Total", "Values with 1 occurrence only"] +] + +for key, label in (("token", "Forms"), ("lemma", "Lemma"), ("POS", "POS")): + count_table.append([label, f"{len(stats[key]):,}", f"{sum(stats[key].values()):,}", f"{list(stats[key].values()).count(1):,}"]) + +count_table = tabulate.tabulate(count_table, headers="firstrow", tablefmt="github") + +print(count_table) + +# Morphology + +stats2 = defaultdict(Counter) + +for file in glob.glob("output-morph/*.tsv"): + for line in read_csv(file): + if not line: + continue + for key, value in line.items(): + stats2[key][value] += 1 + +morph_table = [ + ["Category", "Different", "Total", "Non-x values"] +] + + +for key, label in ( + ('MODE', 'Mode'), + ('TEMPS', 'Temps'), + ('PERS', 'Personne'), + ('NOMB', 'Nombre'), + ('GENRE', 'Genre'), + ('CAS', 'Cas'), + ('DEGRE', 'Degre'), + #('SPEC', 'Spec?') +): + morph_table.append([label, f"{len(stats2[key]):,}", f"{sum(stats2[key].values()):,}", f"{sum(stats2[key].values())-(stats2[key][key+'=x']+stats2[key][key+'.=x']):,}"]) + +morph_table = tabulate.tabulate(morph_table, headers="firstrow", tablefmt="github") + +print("\n## Morphology counts\n") +print("*Non-x* values means that the category actually applied to the token: a verb will have a DEGRE annotation of x, because verb can't have DEGRE.\n") + +print(morph_table) + +for key, label in [ + ("POS", "POS"), + ('MODE', 'Mode'), + ('TEMPS', 'Temps'), + ('PERS', 'Personne'), + ('NOMB', 'Nombre'), + ('GENRE', 'Genre'), + ('CAS', 'Cas'), + ('DEGRE', 'Degre'),]: + print(f"\n ## {label}\n") + + s = stats2 + if key == "POS": + s = stats + + print(tabulate.tabulate([(x, f"{y:,}") for x, y in s[key].most_common()], headers=["Value", "Count"], tablefmt="github")) \ No newline at end of file diff --git a/tooling/papie-configs/lemma.json b/tooling/papie-configs/lemma.json new file mode 100644 index 0000000..3be9a64 --- /dev/null +++ b/tooling/papie-configs/lemma.json @@ -0,0 +1,100 @@ +{ + "modelname":"Fro-Lemma", + "modelpath":"./models/", + "run_test":false, + "dev_path":"output-lemma-pos/dev.tsv", + "input_path":"output-lemma-pos/train.tsv", + "test_path":"output-lemma-pos/test.tsv", + "header":true, + "sep":"\t", + "breakline_ref":"POS", + "breakline_data":"NONE", + "char_max_size":500, + "word_max_size":20000, + "max_sent_len":35, + "max_sents":1000000, + "char_lower": false, + "char_min_freq":1, + "word_min_freq":1, + "char_eos":true, + "char_bos":true, + "tasks":[ + { + "name":"lemma", + "target":true, + "context":"sentence", + "level":"char", + "decoder":"attentional", + "settings":{ + "bos":true, + "eos":true, + "lower":false, + "target":"lemma" + }, + "layer":-1, + "schedule": { + "evaluation": "precision" + } + } + ], + "task_defaults":{ + "level":"token", + "layer":-1, + "decoder":"linear", + "context":"sentence" + }, + "threshold":0.0002, + "min_weight":0.2, + "include_lm":true,//Just to see if this is the issue + "lm_shared_softmax":true, + "lm_schedule":{ + "patience":2, + "factor":0.5, + "weight":0.2, + "mode":"min" + }, + "batch_size":100, // Was 200 + "epochs":100, + "word_dropout":0, + + "clip_norm":5, + "linear_layers":1, + "hidden_size":150, + "num_layers":1, + "cell":"LSTM", + "wemb_dim":0, + "merge_type":"concat", + "cemb_dim":300, + "cemb_type":"rnn", + "cemb_layers":2, + "custom_cemb_cell":false, + "checks_per_epoch":1, + "report_freq":200, + "verbose":true, + "device":"cuda", + "buffer_size":10000, + "minimize_pad":false, + "shuffle":true, + "pretrain_embeddings":false, + "load_pretrained_embeddings":"", + "load_pretrained_encoder":"", + "freeze_embeddings":false, + "scorer":"general", + + // Optimizer + "optimizer": "Ranger", + + "cache_dataset": true, + "dropout":0.32, + "lr":0.004901105542864395, + "lr_patience":2, + "patience":5, + "factor":0.6, + "noise_strategies": { + "uppercase": { + "apply": true, + "ratio": 0.10, + "params": {} + } + } +} diff --git a/tooling/papie-configs/morph-cas.json b/tooling/papie-configs/morph-cas.json new file mode 100644 index 0000000..b249154 --- /dev/null +++ b/tooling/papie-configs/morph-cas.json @@ -0,0 +1,100 @@ +{ + "modelname":"Fro-L2", + "modelpath":"./models/", + "run_test":false, + "dev_path":"output-morph/dev.tsv", + "input_path":"output-morph/train.tsv", + "test_path":"output-morph/test.tsv", + "header":true, + "sep":"\t", + "breakline_ref":"POS", + "breakline_data":"NONE", + "char_max_size":500, + "word_max_size":20000, + "max_sent_len":35, + "max_sents":1000000, + "char_min_freq":1, + "word_min_freq":1, + "char_eos":true, + "char_bos":true, + "char_lower":false, + "word_lower":true, + "utfnorm": false, + "mixed_precision": false, + "tasks":[ + { + "name":"CAS", + "target":true + } + ], + "task_defaults":{ + "level":"token", + "layer":-1, + "decoder":"linear", + "context":"sentence", + "schedule": { + "evaluation": "accuracy" + } + }, + "lm_shared_softmax":true, + "lm_schedule":{ + "patience":2, + "factor":0.5, + "weight":0.2, + "mode":"min" + }, + + // Ignore or dont change + "word_dropout":0, + "wemb_dim":0, + "clip_norm":5, + "checks_per_epoch":1, + "report_freq":128, + "buffer_size":10000, + "minimize_pad":false, + "pretrain_embeddings":false, + "load_pretrained_embeddings":"", + "load_pretrained_encoder":"", + "freeze_embeddings":false, + "scorer":"general", + "epochs":100, + + // Stable params + "cell":"GRU", + "merge_type":"concat", + "linear_layers":1, + "cemb_type":"rnn", + "custom_cemb_cell":false, + "shuffle":true, + "threshold":0.0001, + "min_weight":0.2, + "include_lm":true, + + // Impactful Hyperparams + "num_layers":2, + "hidden_size":150, + "cemb_dim":150, + "cemb_layers":2, + + // Optimizer & LR + "optimizer": "Ranger", + + "cache_dataset": true, + "dropout":0.32, + "lr":0.004901105542864395, + "lr_patience":2, + "patience":5, + "factor":0.6, + "noise_strategies": { + "uppercase": { + "apply": true, + "ratio": 0.10, + "params": {} + } + }, + + // Device & verbosity + "verbose":true, + "batch_size":128, + "device":"cuda" +} \ No newline at end of file diff --git a/tooling/papie-configs/morph-degre.json b/tooling/papie-configs/morph-degre.json new file mode 100644 index 0000000..2479e88 --- /dev/null +++ b/tooling/papie-configs/morph-degre.json @@ -0,0 +1,100 @@ +{ + "modelname":"Fro-L2", + "modelpath":"./models/", + "run_test":false, + "dev_path":"output-morph/dev.tsv", + "input_path":"output-morph/train.tsv", + "test_path":"output-morph/test.tsv", + "header":true, + "sep":"\t", + "breakline_data":"^(\\.|segm)$", + "breakline_ref":"input", + "char_max_size":500, + "word_max_size":20000, + "max_sent_len":35, + "max_sents":1000000, + "char_min_freq":1, + "word_min_freq":1, + "char_eos":true, + "char_bos":true, + "char_lower":false, + "word_lower":true, + "utfnorm": false, + "mixed_precision": false, + "tasks":[ + { + "name":"DEGRE", + "target":true + } + ], + "task_defaults":{ + "level":"token", + "layer":-1, + "decoder":"linear", + "context":"sentence", + "schedule": { + "evaluation": "accuracy" + } + }, + "lm_shared_softmax":true, + "lm_schedule":{ + "patience":2, + "factor":0.5, + "weight":0.2, + "mode":"min" + }, + + // Ignore or dont change + "word_dropout":0, + "wemb_dim":0, + "clip_norm":5, + "checks_per_epoch":1, + "report_freq":128, + "buffer_size":10000, + "minimize_pad":false, + "pretrain_embeddings":false, + "load_pretrained_embeddings":"", + "load_pretrained_encoder":"", + "freeze_embeddings":false, + "scorer":"general", + "epochs":100, + + // Stable params + "cell":"GRU", + "merge_type":"concat", + "linear_layers":1, + "cemb_type":"rnn", + "custom_cemb_cell":false, + "shuffle":true, + "threshold":0.0001, + "min_weight":0.2, + "include_lm":true, + + // Impactful Hyperparams + "num_layers":2, + "cemb_layers":2, + "cemb_dim":150, + "hidden_size":150, + + // Optimizer & LR + "optimizer": "Ranger", + + "cache_dataset": true, + "dropout":0.32, + "lr":0.004901105542864395, + "lr_patience":2, + "patience":5, + "factor":0.6, + "noise_strategies": { + "uppercase": { + "apply": true, + "ratio": 0.10, + "params": {} + } + }, + + // Device & verbosity + "verbose":true, + "batch_size":128, + "device":"cuda" +} \ No newline at end of file diff --git a/tooling/papie-configs/morph-genre.json b/tooling/papie-configs/morph-genre.json new file mode 100644 index 0000000..939b422 --- /dev/null +++ b/tooling/papie-configs/morph-genre.json @@ -0,0 +1,100 @@ +{ + "modelname":"Fro-L2", + "modelpath":"./models/", + "run_test":false, + "dev_path":"output-morph/dev.tsv", + "input_path":"output-morph/train.tsv", + "test_path":"output-morph/test.tsv", + "header":true, + "sep":"\t", + "breakline_data":"^(\\.|segm)$", + "breakline_ref":"input", + "char_max_size":500, + "word_max_size":20000, + "max_sent_len":35, + "max_sents":1000000, + "char_min_freq":1, + "word_min_freq":1, + "char_eos":true, + "char_bos":true, + "char_lower":false, + "word_lower":true, + "utfnorm": false, + "mixed_precision": false, + "tasks":[ + { + "name":"GENRE", + "target":true + } + ], + "task_defaults":{ + "level":"token", + "layer":-1, + "decoder":"linear", + "context":"sentence", + "schedule": { + "evaluation": "accuracy" + } + }, + "lm_shared_softmax":true, + "lm_schedule":{ + "patience":2, + "factor":0.5, + "weight":0.2, + "mode":"min" + }, + + // Ignore or dont change + "word_dropout":0, + "wemb_dim":0, + "clip_norm":5, + "checks_per_epoch":1, + "report_freq":128, + "buffer_size":10000, + "minimize_pad":false, + "pretrain_embeddings":false, + "load_pretrained_embeddings":"", + "load_pretrained_encoder":"", + "freeze_embeddings":false, + "scorer":"general", + "epochs":100, + + // Stable params + "cell":"GRU", + "merge_type":"concat", + "linear_layers":1, + "cemb_type":"rnn", + "custom_cemb_cell":false, + "shuffle":true, + "threshold":0.0001, + "min_weight":0.2, + "include_lm":true, + + // Impactful Hyperparams + "num_layers":2, + "cemb_layers":2, + "cemb_dim":150, + "hidden_size":150, + + // Optimizer & LR + "optimizer": "Ranger", + + "cache_dataset": true, + "dropout":0.32, + "lr":0.004901105542864395, + "lr_patience":2, + "patience":5, + "factor":0.6, + "noise_strategies": { + "uppercase": { + "apply": true, + "ratio": 0.10, + "params": {} + } + }, + + // Device & verbosity + "verbose":true, + "batch_size":128, + "device":"cuda" +} \ No newline at end of file diff --git a/tooling/papie-configs/morph-mode.json b/tooling/papie-configs/morph-mode.json new file mode 100644 index 0000000..8c6be35 --- /dev/null +++ b/tooling/papie-configs/morph-mode.json @@ -0,0 +1,100 @@ +{ + "modelname":"Fro-L2", + "modelpath":"./models/", + "run_test":false, + "dev_path":"output-morph/dev.tsv", + "input_path":"output-morph/train.tsv", + "test_path":"output-morph/test.tsv", + "header":true, + "sep":"\t", + "breakline_data":"^(\\.|segm)$", + "breakline_ref":"input", + "char_max_size":500, + "word_max_size":20000, + "max_sent_len":35, + "max_sents":1000000, + "char_min_freq":1, + "word_min_freq":1, + "char_eos":true, + "char_bos":true, + "char_lower":false, + "word_lower":true, + "utfnorm": false, + "mixed_precision": false, + "tasks":[ + { + "name":"MODE", + "target":true + } + ], + "task_defaults":{ + "level":"token", + "layer":-1, + "decoder":"linear", + "context":"sentence", + "schedule": { + "evaluation": "accuracy" + } + }, + "lm_shared_softmax":true, + "lm_schedule":{ + "patience":2, + "factor":0.5, + "weight":0.2, + "mode":"min" + }, + + // Ignore or dont change + "word_dropout":0, + "wemb_dim":0, + "clip_norm":5, + "checks_per_epoch":1, + "report_freq":128, + "buffer_size":10000, + "minimize_pad":false, + "pretrain_embeddings":false, + "load_pretrained_embeddings":"", + "load_pretrained_encoder":"", + "freeze_embeddings":false, + "scorer":"general", + "epochs":100, + + // Stable params + "cell":"GRU", + "merge_type":"concat", + "linear_layers":1, + "cemb_type":"rnn", + "custom_cemb_cell":false, + "shuffle":true, + "threshold":0.0001, + "min_weight":0.2, + "include_lm":true, + + // Impactful Hyperparams + "num_layers":2, + "cemb_layers":2, + "cemb_dim":150, + "hidden_size":150, + + // Optimizer & LR + "optimizer": "Ranger", + + "cache_dataset": true, + "dropout":0.32, + "lr":0.004901105542864395, + "lr_patience":2, + "patience":5, + "factor":0.6, + "noise_strategies": { + "uppercase": { + "apply": true, + "ratio": 0.10, + "params": {} + } + }, + + // Device & verbosity + "verbose":true, + "batch_size":128, + "device":"cuda" +} \ No newline at end of file diff --git a/tooling/papie-configs/morph-nomb.json b/tooling/papie-configs/morph-nomb.json new file mode 100644 index 0000000..313f9aa --- /dev/null +++ b/tooling/papie-configs/morph-nomb.json @@ -0,0 +1,100 @@ +{ + "modelname":"Fro-L2", + "modelpath":"./models/", + "run_test":false, + "dev_path":"output-morph/dev.tsv", + "input_path":"output-morph/train.tsv", + "test_path":"output-morph/test.tsv", + "header":true, + "sep":"\t", + "breakline_data":"^(\\.|segm)$", + "breakline_ref":"input", + "char_max_size":500, + "word_max_size":20000, + "max_sent_len":35, + "max_sents":1000000, + "char_min_freq":1, + "word_min_freq":1, + "char_eos":true, + "char_bos":true, + "char_lower":false, + "word_lower":true, + "utfnorm": false, + "mixed_precision": false, + "tasks":[ + { + "name":"NOMB", + "target":true + } + ], + "task_defaults":{ + "level":"token", + "layer":-1, + "decoder":"linear", + "context":"sentence", + "schedule": { + "evaluation": "accuracy" + } + }, + "lm_shared_softmax":true, + "lm_schedule":{ + "patience":2, + "factor":0.5, + "weight":0.2, + "mode":"min" + }, + + // Ignore or dont change + "word_dropout":0, + "wemb_dim":0, + "clip_norm":5, + "checks_per_epoch":1, + "report_freq":128, + "buffer_size":10000, + "minimize_pad":false, + "pretrain_embeddings":false, + "load_pretrained_embeddings":"", + "load_pretrained_encoder":"", + "freeze_embeddings":false, + "scorer":"general", + "epochs":100, + + // Stable params + "cell":"GRU", + "merge_type":"concat", + "linear_layers":1, + "cemb_type":"rnn", + "custom_cemb_cell":false, + "shuffle":true, + "threshold":0.0001, + "min_weight":0.2, + "include_lm":true, + + // Impactful Hyperparams + "num_layers":2, + "cemb_layers":2, + "cemb_dim":150, + "hidden_size":150, + + // Optimizer & LR + "optimizer": "Ranger", + + "cache_dataset": true, + "dropout":0.32, + "lr":0.004901105542864395, + "lr_patience":2, + "patience":5, + "factor":0.6, + "noise_strategies": { + "uppercase": { + "apply": true, + "ratio": 0.10, + "params": {} + } + }, + + // Device & verbosity + "verbose":true, + "batch_size":128, + "device":"cuda" +} \ No newline at end of file diff --git a/tooling/papie-configs/morph-pers.json b/tooling/papie-configs/morph-pers.json new file mode 100644 index 0000000..c0185d8 --- /dev/null +++ b/tooling/papie-configs/morph-pers.json @@ -0,0 +1,100 @@ +{ + "modelname":"Fro-L2", + "modelpath":"./models/", + "run_test":false, + "dev_path":"output-morph/dev.tsv", + "input_path":"output-morph/train.tsv", + "test_path":"output-morph/test.tsv", + "header":true, + "sep":"\t", + "breakline_data":"^(\\.|segm)$", + "breakline_ref":"input", + "char_max_size":500, + "word_max_size":20000, + "max_sent_len":35, + "max_sents":1000000, + "char_min_freq":1, + "word_min_freq":1, + "char_eos":true, + "char_bos":true, + "char_lower":false, + "word_lower":true, + "utfnorm": false, + "mixed_precision": false, + "tasks":[ + { + "name":"PERS", + "target":true + } + ], + "task_defaults":{ + "level":"token", + "layer":-1, + "decoder":"linear", + "context":"sentence", + "schedule": { + "evaluation": "accuracy" + } + }, + "lm_shared_softmax":true, + "lm_schedule":{ + "patience":2, + "factor":0.5, + "weight":0.2, + "mode":"min" + }, + + // Ignore or dont change + "word_dropout":0, + "wemb_dim":0, + "clip_norm":5, + "checks_per_epoch":1, + "report_freq":128, + "buffer_size":10000, + "minimize_pad":false, + "pretrain_embeddings":false, + "load_pretrained_embeddings":"", + "load_pretrained_encoder":"", + "freeze_embeddings":false, + "scorer":"general", + "epochs":100, + + // Stable params + "cell":"GRU", + "merge_type":"concat", + "linear_layers":1, + "cemb_type":"rnn", + "custom_cemb_cell":false, + "shuffle":true, + "threshold":0.0001, + "min_weight":0.2, + "include_lm":true, + + // Impactful Hyperparams + "num_layers":2, + "cemb_layers":2, + "cemb_dim":150, + "hidden_size":150, + + // Optimizer & LR + "optimizer": "Ranger", + + "cache_dataset": true, + "dropout":0.32, + "lr":0.004901105542864395, + "lr_patience":2, + "patience":5, + "factor":0.6, + "noise_strategies": { + "uppercase": { + "apply": true, + "ratio": 0.10, + "params": {} + } + }, + + // Device & verbosity + "verbose":true, + "batch_size":128, + "device":"cuda" +} \ No newline at end of file diff --git a/tooling/papie-configs/morph-temps.json b/tooling/papie-configs/morph-temps.json new file mode 100644 index 0000000..79390bf --- /dev/null +++ b/tooling/papie-configs/morph-temps.json @@ -0,0 +1,100 @@ +{ + "modelname":"Fro-L2", + "modelpath":"./models/", + "run_test":false, + "dev_path":"output-morph/dev.tsv", + "input_path":"output-morph/train.tsv", + "test_path":"output-morph/test.tsv", + "header":true, + "sep":"\t", + "breakline_data":"^(\\.|segm)$", + "breakline_ref":"input", + "char_max_size":500, + "word_max_size":20000, + "max_sent_len":35, + "max_sents":1000000, + "char_min_freq":1, + "word_min_freq":1, + "char_eos":true, + "char_bos":true, + "char_lower":false, + "word_lower":true, + "utfnorm": false, + "mixed_precision": false, + "tasks":[ + { + "name":"TEMPS", + "target":true + } + ], + "task_defaults":{ + "level":"token", + "layer":-1, + "decoder":"linear", + "context":"sentence", + "schedule": { + "evaluation": "accuracy" + } + }, + "lm_shared_softmax":true, + "lm_schedule":{ + "patience":2, + "factor":0.5, + "weight":0.2, + "mode":"min" + }, + + // Ignore or dont change + "word_dropout":0, + "wemb_dim":0, + "clip_norm":5, + "checks_per_epoch":1, + "report_freq":128, + "buffer_size":10000, + "minimize_pad":false, + "pretrain_embeddings":false, + "load_pretrained_embeddings":"", + "load_pretrained_encoder":"", + "freeze_embeddings":false, + "scorer":"general", + "epochs":100, + + // Stable params + "cell":"GRU", + "merge_type":"concat", + "linear_layers":1, + "cemb_type":"rnn", + "custom_cemb_cell":false, + "shuffle":true, + "threshold":0.0001, + "min_weight":0.2, + "include_lm":true, + + // Impactful Hyperparams + "num_layers":2, + "cemb_layers":2, + "cemb_dim":150, + "hidden_size":150, + + // Optimizer & LR + "optimizer": "Ranger", + + "cache_dataset": true, + "dropout":0.32, + "lr":0.004901105542864395, + "lr_patience":2, + "patience":5, + "factor":0.6, + "noise_strategies": { + "uppercase": { + "apply": true, + "ratio": 0.10, + "params": {} + } + }, + + // Device & verbosity + "verbose":true, + "batch_size":128, + "device":"cuda" +} \ No newline at end of file diff --git a/tooling/papie-configs/pos.json b/tooling/papie-configs/pos.json new file mode 100644 index 0000000..238d802 --- /dev/null +++ b/tooling/papie-configs/pos.json @@ -0,0 +1,97 @@ +{ + "modelname":"Fro-L2", + "modelpath":"./models/", + "run_test":false, + "dev_path":"output-lemma-pos/dev.tsv", + "input_path":"output-lemma-pos/train.tsv", + "test_path":"output-lemma-pos/test.tsv", + "header":true, + "sep":"\t", + "breakline_data":"^(\\.|segm)$", + "breakline_ref":"input", + "char_max_size":500, + "word_max_size":20000, + "max_sent_len":35, + "max_sents":1000000, + "char_min_freq":1, + "word_min_freq":1, + "char_eos":true, + "char_bos":true, + "char_lower":false, + "word_lower":true, + "utfnorm": false, + "mixed_precision": false, + "tasks":[ + { + "name":"POS", + "target":true + } + ], + "task_defaults":{ + "level":"token", + "layer":-1, + "decoder":"linear", + "context":"sentence" + }, + "lm_shared_softmax":true, + "lm_schedule":{ + "patience":2, + "factor":0.5, + "weight":0.2, + "mode":"min" + }, + + // Ignore or dont change + "word_dropout":0, + "wemb_dim":0, + "clip_norm":5, + "checks_per_epoch":1, + "report_freq":128, + "buffer_size":10000, + "minimize_pad":false, + "pretrain_embeddings":false, + "load_pretrained_embeddings":"", + "load_pretrained_encoder":"", + "freeze_embeddings":false, + "scorer":"general", + "epochs":100, + + // Stable params + "cell":"GRU", + "merge_type":"concat", + "linear_layers":1, + "cemb_type":"rnn", + "custom_cemb_cell":false, + "shuffle":true, + "threshold":0.0001, + "min_weight":0.2, + "include_lm":true, + + // Impactful Hyperparams + "num_layers":2, + "cemb_layers":2, + "cemb_dim":150, + "hidden_size":150, + + // Optimizer & LR + "optimizer": "Ranger", + + "cache_dataset": true, + "dropout":0.32, + "lr":0.004901105542864395, + "lr_patience":2, + "patience":5, + "factor":0.6, + "noise_strategies": { + "uppercase": { + "apply": true, + "ratio": 0.10, + "params": {} + } + }, + + // Device & verbosity + "verbose":true, + "batch_size":128, + "device":"cuda" +} diff --git a/tooling/requirements.txt b/tooling/requirements.txt new file mode 100644 index 0000000..118f984 --- /dev/null +++ b/tooling/requirements.txt @@ -0,0 +1 @@ +protogenie==0.0.7 diff --git a/tsv/Chrestien_Cliges3_posBFM_aligne.tsv b/tsv/LemmaPos/Chrestien_Cliges3_posBFM_aligne.tsv similarity index 100% rename from tsv/Chrestien_Cliges3_posBFM_aligne.tsv rename to tsv/LemmaPos/Chrestien_Cliges3_posBFM_aligne.tsv diff --git a/tsv/Chrestien_Erec3_posBFM_aligne.tsv b/tsv/LemmaPos/Chrestien_Erec3_posBFM_aligne.tsv similarity index 100% rename from tsv/Chrestien_Erec3_posBFM_aligne.tsv rename to tsv/LemmaPos/Chrestien_Erec3_posBFM_aligne.tsv diff --git a/tsv/Chrestien_Lancelot3_posBFM_aligne.tsv b/tsv/LemmaPos/Chrestien_Lancelot3_posBFM_aligne.tsv similarity index 100% rename from tsv/Chrestien_Lancelot3_posBFM_aligne.tsv rename to tsv/LemmaPos/Chrestien_Lancelot3_posBFM_aligne.tsv diff --git a/tsv/Chrestien_Perceval3_posBFM_aligne.tsv b/tsv/LemmaPos/Chrestien_Perceval3_posBFM_aligne.tsv similarity index 100% rename from tsv/Chrestien_Perceval3_posBFM_aligne.tsv rename to tsv/LemmaPos/Chrestien_Perceval3_posBFM_aligne.tsv diff --git a/tsv/Chrestien_Yvain3_posBFM_aligne.tsv b/tsv/LemmaPos/Chrestien_Yvain3_posBFM_aligne.tsv similarity index 100% rename from tsv/Chrestien_Yvain3_posBFM_aligne.tsv rename to tsv/LemmaPos/Chrestien_Yvain3_posBFM_aligne.tsv diff --git a/tsv/Code_Institutes.tsv b/tsv/LemmaPos/Code_Institutes.tsv similarity index 100% rename from tsv/Code_Institutes.tsv rename to tsv/LemmaPos/Code_Institutes.tsv diff --git a/tsv/Code_code1.tsv b/tsv/LemmaPos/Code_code1.tsv similarity index 100% rename from tsv/Code_code1.tsv rename to tsv/LemmaPos/Code_code1.tsv diff --git a/tsv/Lancelot_aoCompletV5.tsv b/tsv/LemmaPos/Lancelot_aoCompletV5.tsv similarity index 100% rename from tsv/Lancelot_aoCompletV5.tsv rename to tsv/LemmaPos/Lancelot_aoCompletV5.tsv diff --git a/tsv/Geste_ed_GarLorrBa.tsv b/tsv/LemmaPosMorph/EmptyLine/Geste_ed_GarLorrBa.tsv similarity index 100% rename from tsv/Geste_ed_GarLorrBa.tsv rename to tsv/LemmaPosMorph/EmptyLine/Geste_ed_GarLorrBa.tsv diff --git a/tsv/Geste_ed_GarLorrBe1.tsv b/tsv/LemmaPosMorph/EmptyLine/Geste_ed_GarLorrBe1.tsv similarity index 100% rename from tsv/Geste_ed_GarLorrBe1.tsv rename to tsv/LemmaPosMorph/EmptyLine/Geste_ed_GarLorrBe1.tsv diff --git a/tsv/Geste_ed_GarLorrBe2.tsv b/tsv/LemmaPosMorph/EmptyLine/Geste_ed_GarLorrBe2.tsv similarity index 100% rename from tsv/Geste_ed_GarLorrBe2.tsv rename to tsv/LemmaPosMorph/EmptyLine/Geste_ed_GarLorrBe2.tsv diff --git a/tsv/Geste_transcr_Fier_V.tsv b/tsv/LemmaPosMorph/EmptyLine/Geste_transcr_Fier_V.tsv similarity index 100% rename from tsv/Geste_transcr_Fier_V.tsv rename to tsv/LemmaPosMorph/EmptyLine/Geste_transcr_Fier_V.tsv diff --git a/tsv/Code_code4.tsv b/tsv/LemmaPosMorph/PONfrt/Code_code4.tsv similarity index 100% rename from tsv/Code_code4.tsv rename to tsv/LemmaPosMorph/PONfrt/Code_code4.tsv diff --git a/tsv/DocLing_sample1.tsv b/tsv/LemmaPosMorph/PONfrt/DocLing_sample1.tsv similarity index 97% rename from tsv/DocLing_sample1.tsv rename to tsv/LemmaPosMorph/PONfrt/DocLing_sample1.tsv index 3c07049..86dda26 100755 --- a/tsv/DocLing_sample1.tsv +++ b/tsv/LemmaPosMorph/PONfrt/DocLing_sample1.tsv @@ -1,5 +1,5 @@ form lemma POS morph -chdouai0120 +[REF:chdouai0120] Ref. OUT MORPH=empty Sacent savoir VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=p tout tot DETind NOMB.=p|GENRE=m|CAS=n cil cel PROdem NOMB.=p|GENRE=m|CAS=n @@ -454,7 +454,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty novembre novembre NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -chdouai0216 +[REF:chdouai0216] Ref. OUT MORPH=empty Sacent savoir VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=p tout tot DETind NOMB.=p|GENRE=m|CAS=n cil cel PROdem NOMB.=p|GENRE=m|CAS=n @@ -741,7 +741,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty march marz NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -chdouai0271 +[REF:chdouai0271] Ref. OUT MORPH=empty Sacent savoir VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=p tout tot DETind NOMB.=p|GENRE=m|CAS=n cil cel PROdem NOMB.=p|GENRE=m|CAS=n @@ -1214,7 +1214,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty septembre setembre NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -chdouai0456 +[REF:chdouai0456] Ref. OUT MORPH=empty Sacent savoir VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=p tout tot DETind NOMB.=p|GENRE=m|CAS=n cil cel PROdem NOMB.=p|GENRE=m|CAS=n @@ -2543,7 +2543,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty juing jüin NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -chdouai0497 +[REF:chdouai0497] Ref. OUT MORPH=empty Sacent savoir VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=p tout tot DETind NOMB.=p|GENRE=m|CAS=n cil cel PROdem NOMB.=p|GENRE=m|CAS=n @@ -2723,7 +2723,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty march marz NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -ChHM075 +[REF:ChHM075] Ref. OUT MORPH=empty Ce ce1 PROdem NOMB.=s|GENRE=n|CAS=n sunt estre1 VERcjg MODE=ind|TEMPS=pst|PERS.=3|NOMB.=p les le DETdef NOMB.=p|GENRE=f|CAS=n @@ -6407,7 +6407,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty decenbre decembre NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -ChHM177 +[REF:ChHM177] Ref. OUT MORPH=empty En en1 PRE MORPH=empty non nom NOMcom NOMB.=s|GENRE=m|CAS=r dou de+le PRE.DETdef MORPH=empty+NOMB.=s|GENRE=m|CAS=r @@ -8959,7 +8959,7 @@ jullet juillet NOMcom NOMB.=s|GENRE=m|CAS=r devant devant ADVgen DEGRE=- dit dire VERppe NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -ChHM237 +[REF:ChHM237] Ref. OUT MORPH=empty En en1 PRE MORPH=empty nom nom NOMcom NOMB.=s|GENRE=m|CAS=r dou de+le PRE.DETdef MORPH=empty+NOMB.=s|GENRE=m|CAS=r @@ -10686,7 +10686,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty fevrier fevrier NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -ChHM273 +[REF:ChHM273] Ref. OUT MORPH=empty À a3 PRE MORPH=empty toz tot DETind NOMB.=p|GENRE=m|CAS=r ces cel PROdem NOMB.=p|GENRE=m|CAS=r @@ -11806,7 +11806,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r d' de PRE MORPH=empty avri avril NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -ChHM275 +[REF:ChHM275] Ref. OUT MORPH=empty In in ETR MORPH=empty nomine nomen ETR NOMB.=s|GENRE=n|CAS=r Patris pater ETR NOMB.=s|GENRE=m|CAS=r @@ -13794,7 +13794,7 @@ seaux sëel2 NOMcom NOMB.=p|GENRE=m|CAS=r ceste cest DETdem NOMB.=s|GENRE=f|CAS=r execucion execucïon NOMcom NOMB.=s|GENRE=f|CAS=r . . PONfrt MORPH=empty -ChMa001 +[REF:ChMa001] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n / / PONfbl MORPH=empty . . PONfrt MORPH=empty @@ -14380,7 +14380,7 @@ XXX 30 ADJcar NOMB.=p|GENRE=m|CAS=r . . PONfrt MORPH=empty quarto catre ADJcar NOMB.=p|GENRE=m|CAS=r . . PONfrt MORPH=empty -ChMa010 +[REF:ChMa010] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Wermonz Wermont NOMpro NOMB.=s|GENRE=m|CAS=n vidames visdame NOMcom NOMB.=s|GENRE=m|CAS=n @@ -15244,7 +15244,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r d' de PRE MORPH=empty avril avril NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -ChMa032 +[REF:ChMa032] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Pierres Pierre NOMpro NOMB.=s|GENRE=m|CAS=n de de PRE MORPH=empty @@ -15735,7 +15735,7 @@ moiz mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty joilet juillet NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -ChMa040 +[REF:ChMa040] Ref. OUT MORPH=empty Ge je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Jofrois Geoffroi NOMpro NOMB.=s|GENRE=m|CAS=n chevaliers chevalier NOMcom NOMB.=s|GENRE=m|CAS=n @@ -16074,7 +16074,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty mai mai NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -ChMa042 +[REF:ChMa042] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Jehans Jean NOMpro NOMB.=s|GENRE=m|CAS=n sires seignor NOMcom NOMB.=s|GENRE=m|CAS=n @@ -16486,7 +16486,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty mai mai NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -ChMa061 +[REF:ChMa061] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Ponsars Poinçard NOMpro NOMB.=s|GENRE=m|CAS=n doïens doiien NOMcom NOMB.=s|GENRE=m|CAS=n diff --git a/tsv/DocLing_sample2.tsv b/tsv/LemmaPosMorph/PONfrt/DocLing_sample2.tsv similarity index 97% rename from tsv/DocLing_sample2.tsv rename to tsv/LemmaPosMorph/PONfrt/DocLing_sample2.tsv index 7d9c8d1..f3fbaa0 100644 --- a/tsv/DocLing_sample2.tsv +++ b/tsv/LemmaPosMorph/PONfrt/DocLing_sample2.tsv @@ -1,5 +1,5 @@ form lemma POS morph -CHCor012 +[REF:CHCor012] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty Hugues Hugues NOMpro NOMB.=s|GENRE=m|CAS=n @@ -422,7 +422,7 @@ moys mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty fevrer fevrier NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHCor115 +[REF:CHCor115] Ref. OUT MORPH=empty Nos nos1 PROper PERS.=1|NOMB.=p|GENRE=m|CAS=n , , PONfbl MORPH=empty Hugues Hugues NOMpro NOMB.=s|GENRE=m|CAS=n @@ -746,7 +746,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty janvier jenvier NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHCor143 +[REF:CHCor143] Ref. OUT MORPH=empty A a3 PRE MORPH=empty touz tot DETind NOMB.=p|GENRE=m|CAS=r ces cel PROdem NOMB.=p|GENRE=m|CAS=r @@ -2315,7 +2315,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty janvier jenvier NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHCor160 +[REF:CHCor160] Ref. OUT MORPH=empty A a3 PRE MORPH=empty touz tot DETind NOMB.=p|GENRE=m|CAS=r cels cel PROdem NOMB.=p|GENRE=m|CAS=r @@ -3307,7 +3307,7 @@ feste feste1 NOMcom NOMB.=s|GENRE=f|CAS=r saint saint ADJqua NOMB.=s|GENRE=m|CAS=r|DEGRE=p Denise Denis NOMpro NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHCor52 +[REF:CHCor52] Ref. OUT MORPH=empty Gié je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty Henris Henri NOMpro NOMB.=s|GENRE=m|CAS=n @@ -3704,7 +3704,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r d' de PRE MORPH=empty octouvre uitovre NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHHS080 +[REF:CHHS080] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Willermins Guillaume NOMpro NOMB.=s|GENRE=m|CAS=n diz dire VERppe NOMB.=s|GENRE=m|CAS=n @@ -4212,7 +4212,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r d' de PRE MORPH=empty octembre octembre NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHHS123 +[REF:CHHS123] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Hugues Hugues NOMpro NOMB.=s|GENRE=m|CAS=n damoiseas damoisel NOMcom NOMB.=s|GENRE=m|CAS=n @@ -5173,7 +5173,7 @@ deffandre defendre VERinf MORPH=empty garantir garantir VERinf MORPH=empty et et CONcoo MORPH=empty appaisier apaisier VERinf MORPH=empty -à -des adès ADVgen DEGRE=- +àdes adès ADVgen DEGRE=- et et CONcoo MORPH=empty en en1 PRE MORPH=empty touz tot DETind NOMB.=p|GENRE=m|CAS=r @@ -5703,7 +5703,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty aost aost NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHHS130 +[REF:CHHS130] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=f|CAS=n Villemote Villemotte NOMpro NOMB.=s|GENRE=f|CAS=n qui qui PROrel NOMB.=s|GENRE=f|CAS=n @@ -6112,7 +6112,7 @@ moubles mueble ADJqua NOMB.=p|GENRE=m|CAS=r|DEGRE=p presans present1 ADJqua NOMB.=p|GENRE=m|CAS=r|DEGRE=p et et CONcoo MORPH=empty à a3 PRE MORPH=empty -- - PONfbl +- - PONfbl MORPH=empty venir venir VERinf MORPH=empty //. //. PONfbl MORPH=empty à a3 PRE MORPH=empty @@ -6241,7 +6241,7 @@ nonante nonante ADJcar NOMB.=p|GENRE=m|CAS=r et et CONcoo MORPH=empty nuef nuef1 ADJcar NOMB.=p|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -CHMe112 +[REF:CHMe112] Ref. OUT MORPH=empty Nos nos1 PROper PERS.=1|NOMB.=p|GENRE=m|CAS=n Wautiers Gautier NOMpro NOMB.=s|GENRE=m|CAS=n par par PRE MORPH=empty @@ -6647,7 +6647,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty mai mai NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -ChMe150 +[REF:ChMe150] Ref. OUT MORPH=empty Nos nos1 PROper PERS.=1|NOMB.=p|GENRE=m|CAS=n Nicholes Nicolet NOMpro NOMB.=s|GENRE=m|CAS=n par par PRE MORPH=empty @@ -7074,7 +7074,7 @@ quinzainne quinzaine NOMcom NOMB.=s|GENRE=f|CAS=r de de PRE MORPH=empty pakes Pasque NOMpro NOMB.=p|GENRE=f|CAS=r . . PONfrt MORPH=empty -CHMe231 +[REF:CHMe231] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Hues Hugues NOMpro NOMB.=s|GENRE=m|CAS=n curés curé NOMcom NOMB.=s|GENRE=m|CAS=n @@ -7256,7 +7256,7 @@ en en1 PRE MORPH=empty la le DETdef NOMB.=s|GENRE=f|CAS=r - - PONfbl MORPH=empty corte cort1 NOMcom NOMB.=s|GENRE=f|CAS=r -Roiz ? Roiz NOMpro NOMB.=s|GENRE=f|CAS=r +Roiz Roiz NOMpro NOMB.=s|GENRE=f|CAS=r desuz desor PRE MORPH=empty la le DETdef NOMB.=s|GENRE=f|CAS=r voie voie NOMcom NOMB.=s|GENRE=f|CAS=r @@ -7325,7 +7325,7 @@ si son4 DETpos PERS.=3|NOMB.=p|GENRE=m|CAS=n hoir oir NOMcom NOMB.=p|GENRE=m|CAS=n / / PONfbl MORPH=empty .I. un DETndf NOMB.=s|GENRE=m|CAS=r -bich ? bichet NOMcom NOMB.=s|GENRE=m|CAS=r +bich bichet NOMcom NOMB.=s|GENRE=m|CAS=r , , PONfbl MORPH=empty Ranxes Rances NOMpro NOMB.=s|GENRE=m|CAS=n li le DETdef NOMB.=s|GENRE=m|CAS=n @@ -7333,7 +7333,7 @@ fiz fil2 NOMcom NOMB.=s|GENRE=m|CAS=n Liebort Liebort NOMpro NOMB.=s|GENRE=m|CAS=r / / PONfbl MORPH=empty .I. un DETndf NOMB.=s|GENRE=m|CAS=r -bich ? bichet NOMcom NOMB.=s|GENRE=m|CAS=r +bich bichet NOMcom NOMB.=s|GENRE=m|CAS=r ,//. ,//. PONfbl MORPH=empty li le DETdef NOMB.=p|GENRE=m|CAS=n hoir oir NOMcom NOMB.=p|GENRE=m|CAS=n @@ -7343,7 +7343,7 @@ fil fil2 NOMcom NOMB.=s|GENRE=m|CAS=r Faudin Faudin NOMpro NOMB.=s|GENRE=m|CAS=r / / PONfbl MORPH=empty .I. un DETndf NOMB.=s|GENRE=m|CAS=r -bich ? bichet NOMcom NOMB.=s|GENRE=m|CAS=r +bich bichet NOMcom NOMB.=s|GENRE=m|CAS=r , , PONfbl MORPH=empty Sernans Sernan NOMpro NOMB.=s|GENRE=m|CAS=n li le DETdef NOMB.=s|GENRE=m|CAS=n @@ -7351,14 +7351,14 @@ fiz fil2 NOMcom NOMB.=s|GENRE=m|CAS=n Formei Formei NOMpro NOMB.=s|GENRE=m|CAS=r / / PONfbl MORPH=empty .I. un DETndf NOMB.=s|GENRE=m|CAS=r -bich ? bichet NOMcom NOMB.=s|GENRE=m|CAS=r +bich bichet NOMcom NOMB.=s|GENRE=m|CAS=r , , PONfbl MORPH=empty Phelippes Philippe NOMpro NOMB.=s|GENRE=m|CAS=n de de PRE MORPH=empty Mezcrinez Mécrinet NOMpro NOMB.=s|GENRE=x|CAS=r / / PONfbl MORPH=empty .I. un DETndf NOMB.=s|GENRE=m|CAS=r -bich ? bichet NOMcom NOMB.=s|GENRE=m|CAS=r +bich bichet NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty Et et CONcoo MORPH=empty ceste cest DETdem NOMB.=s|GENRE=f|CAS=r @@ -7546,7 +7546,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r d' de PRE MORPH=empty aoust aost NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHMe233 +[REF:CHMe233] Ref. OUT MORPH=empty Nos nos1 PROper PERS.=1|NOMB.=p|GENRE=m|CAS=n Thyebauz Thiébaut NOMpro NOMB.=s|GENRE=m|CAS=n cuens conte1 NOMcom NOMB.=s|GENRE=m|CAS=n @@ -8010,7 +8010,7 @@ jusque jusque PRE MORPH=empty la le DETdef NOMB.=s|GENRE=f|CAS=r chaucié chauciee NOMcom NOMB.=s|GENRE=f|CAS=r de de PRE MORPH=empty -le ? le DETdef NOMB.=s|GENRE=m|CAS=r +le le DETdef NOMB.=s|GENRE=m|CAS=r davant devant ADVgen DEGRE=- - - PONfbl MORPH=empty dit dire VERppe NOMB.=s|GENRE=m|CAS=r @@ -8079,7 +8079,7 @@ jors jor NOMcom NOMB.=p|GENRE=m|CAS=r toute tot DETind NOMB.=s|GENRE=f|CAS=r nostre nostre DETpos PERS.=1|NOMB.=s|GENRE=f|CAS=r partie partie NOMcom NOMB.=s|GENRE=f|CAS=r -entierement ? entierement ADVgen DEGRE=- +entierement entierement ADVgen DEGRE=- molin molin NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty Leheimeis Lahaymeix NOMpro NOMB.=s|GENRE=x|CAS=r @@ -8249,7 +8249,7 @@ par par PRE MORPH=empty fiés fief NOMcom NOMB.=p|GENRE=m|CAS=r ne ne2 CONcoo MORPH=empty par par PRE MORPH=empty -arrier -fiés arierefief NOMcom NOMB.=p|GENRE=m|CAS=r +arrierfiés arierefief NOMcom NOMB.=p|GENRE=m|CAS=r , , PONfbl MORPH=empty en en1 PRE MORPH=empty toutes tot DETind NOMB.=p|GENRE=f|CAS=r @@ -8663,7 +8663,7 @@ la le DETdef NOMB.=s|GENRE=f|CAS=r mi mi2 NOMcom NOMB.=s|GENRE=f|CAS=r aaost aost NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHMe236 +[REF:CHMe236] Ref. OUT MORPH=empty Ge je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Thiebaus Thibaut NOMpro NOMB.=s|GENRE=m|CAS=n cuens conte1 NOMcom NOMB.=s|GENRE=m|CAS=n @@ -9009,7 +9009,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r d' de PRE MORPH=empty octobre uitovre NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -CHMM016 +[REF:CHMM016] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty Matheus Matthieu NOMpro NOMB.=s|GENRE=m|CAS=n @@ -9717,7 +9717,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty mai mai NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -CHMM023 +[REF:CHMM023] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty Hues Hugues NOMpro NOMB.=s|GENRE=m|CAS=n @@ -10693,7 +10693,7 @@ des de+le PRE.DETdef MORPH=empty+NOMB.=p|GENRE=m|CAS=r - - PONfbl MORPH=empty apostres apostle NOMcom NOMB.=p|GENRE=m|CAS=r ://. ://. PONfrt MORPH=empty -CHMM032 +[REF:CHMM032] Ref. OUT MORPH=empty Ge je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty Maheus Matthieu NOMpro NOMB.=s|GENRE=m|CAS=n @@ -10794,7 +10794,7 @@ Fontenoy Fontenay NOMpro NOMB.=s|GENRE=x|CAS=r ausi aussi ADVgen DEGRE=- ,//. ,//. PONfbl MORPH=empty et et CONcoo MORPH=empty -qua -que cantque PROrel NOMB.=s|GENRE=n|CAS=r +quaque cantque PROrel NOMB.=s|GENRE=n|CAS=r il il PROimp PERS.=3|NOMB.=s|GENRE=m|CAS=n i i2 PROadv MORPH=empty - - PONfbl MORPH=empty @@ -11015,7 +11015,7 @@ peires paire NOMcom NOMB.=s|GENRE=m|CAS=n mes mon1 DETpos PERS.=1|NOMB.=s|GENRE=m|CAS=n oncles oncle NOMcom NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty -à -la aler VERcjg MODE=ind|TEMPS=psp|PERS.=3|NOMB.=s +àla aler VERcjg MODE=ind|TEMPS=psp|PERS.=3|NOMB.=s outremeir outremer ADVgen DEGRE=- et et CONcoo MORPH=empty li le DETdef NOMB.=s|GENRE=m|CAS=n @@ -11200,7 +11200,7 @@ de de PRE MORPH=empty la le DETdef NOMB.=s|GENRE=f|CAS=r Mauzelainne Madelaine NOMpro NOMB.=s|GENRE=f|CAS=r . . PONfrt MORPH=empty -CHMM040 +[REF:CHMM040] Ref. OUT MORPH=empty Ge je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty Maheus Matthieu NOMpro NOMB.=s|GENRE=m|CAS=n @@ -11301,7 +11301,7 @@ Fontenoy Fontenay NOMpro NOMB.=s|GENRE=x|CAS=r ausi aussi ADVgen DEGRE=- ,//. ,//. PONfbl MORPH=empty et et CONcoo MORPH=empty -qua -que cantque PROrel NOMB.=s|GENRE=n|CAS=r +quaque cantque PROrel NOMB.=s|GENRE=n|CAS=r il il PROimp PERS.=3|NOMB.=s|GENRE=m|CAS=n i i2 PROadv MORPH=empty - - PONfbl MORPH=empty @@ -11707,7 +11707,7 @@ de de PRE MORPH=empty la le DETdef NOMB.=s|GENRE=f|CAS=r Mauzelainne Madelaine NOMpro NOMB.=s|GENRE=f|CAS=r . . PONfrt MORPH=empty -ChMM153 +[REF:ChMM153] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty Ouedes Eudes NOMpro NOMB.=s|GENRE=m|CAS=n @@ -11857,7 +11857,7 @@ autres autre DETind NOMB.=p|GENRE=f|CAS=r chozes chose NOMcom NOMB.=p|GENRE=f|CAS=r , , PONfbl MORPH=empty et et CONcoo MORPH=empty -quan que cantque PROrel NOMB.=s|GENRE=n|CAS=r +quanque cantque PROrel NOMB.=s|GENRE=n|CAS=r nos nos1 PROper PERS.=1|NOMB.=p|GENRE=m|CAS=n avons avoir VERcjg MODE=ind|TEMPS=pst|PERS.=1|NOMB.=p ailors aillors ADVgen DEGRE=- @@ -11895,7 +11895,7 @@ de de PRE MORPH=empty Verdun Verdun NOMpro NOMB.=s|GENRE=x|CAS=r devant devant ADVgen DEGRE=- nommeis nomer VERppe NOMB.=s|GENRE=m|CAS=r -quan -que cantque PROrel NOMB.=s|GENRE=n|CAS=r +quanque cantque PROrel NOMB.=s|GENRE=n|CAS=r nos nos1 PROper PERS.=1|NOMB.=p|GENRE=m|CAS=n avrons avoir VERcjg MODE=ind|TEMPS=fut|PERS.=1|NOMB.=p et et CONcoo MORPH=empty @@ -12214,7 +12214,7 @@ moi mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty mai mai NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHMo045 +[REF:CHMo045] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Hanris Henri NOMpro NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty @@ -13497,7 +13497,7 @@ moes mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty mai mai NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -CHMo167 +[REF:CHMo167] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Gobers Gobert NOMpro NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty @@ -15759,7 +15759,7 @@ et et CONcoo MORPH=empty cinc cinc DETcar NOMB.=p|GENRE=x|CAS=r ans an NOMcom NOMB.=p|GENRE=m|CAS=r //. //. PONfbl MORPH=empty -CHMo196 +[REF:CHMo196] Ref. OUT MORPH=empty Ge je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Thiebaus Thibaut NOMpro NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty @@ -17059,7 +17059,7 @@ on en1+le PRE.DETdef MORPH=empty+NOMB.=s|GENRE=m|CAS=r mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty joillet juillet NOMcom NOMB.=s|GENRE=m|CAS=r -ChMo238 +[REF:ChMo238] Ref. OUT MORPH=empty Conue conoistre VERppe NOMB.=s|GENRE=f|CAS=n chose chose NOMcom NOMB.=s|GENRE=f|CAS=n soit estre1 VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=s @@ -18150,7 +18150,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty mai mai NOMcom NOMB.=s|GENRE=m|CAS=r //. //. PONfbl MORPH=empty -CHMo271 +[REF:CHMo271] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Joffrois Geoffroy NOMpro NOMB.=s|GENRE=m|CAS=n de de PRE MORPH=empty @@ -19191,7 +19191,7 @@ et et CONcoo MORPH=empty dix dis1 DETcar NOMB.=p|GENRE=x|CAS=r anz an NOMcom NOMB.=p|GENRE=m|CAS=r //. //. PONfbl MORPH=empty -CHN001 +[REF:CHN001] Ref. OUT MORPH=empty A a3 PRE MORPH=empty - - PONfbl MORPH=empty toz tot DETind NOMB.=p|GENRE=m|CAS=r @@ -19698,7 +19698,7 @@ Bn non-identifié NOMpro NOMB.=s|GENRE=m|CAS=n de de PRE MORPH=empty Seinan Seinan NOMpro NOMB.=s|GENRE=x|CAS=r ? ? PONfrt MORPH=empty -CHN016 +[REF:CHN016] Ref. OUT MORPH=empty A a3 PRE MORPH=empty //. //. PONfbl MORPH=empty honorable onorable ADJqua NOMB.=s|GENRE=m|CAS=r|DEGRE=p @@ -19971,7 +19971,7 @@ feste feste1 NOMcom NOMB.=s|GENRE=f|CAS=r saint saint ADJqua NOMB.=s|GENRE=m|CAS=r|DEGRE=p Luc Luc NOMpro NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHN021 +[REF:CHN021] Ref. OUT MORPH=empty A a3 PRE MORPH=empty honorable onorable ADJqua NOMB.=s|GENRE=m|CAS=r|DEGRE=p //. //. PONfbl MORPH=empty @@ -21321,7 +21321,7 @@ et et CONcoo MORPH=empty . . PONfrt MORPH=empty neuf nuef1 ADJcar NOMB.=p|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -CHN030 +[REF:CHN030] Ref. OUT MORPH=empty A a3 PRE MORPH=empty touz tot DETind NOMB.=p|GENRE=m|CAS=r ces cel PROdem NOMB.=p|GENRE=m|CAS=r @@ -21831,7 +21831,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty marz marz NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHN031 +[REF:CHN031] Ref. OUT MORPH=empty A a3 PRE MORPH=empty touz tot DETind NOMB.=p|GENRE=m|CAS=r ceus cel PROdem NOMB.=p|GENRE=m|CAS=r @@ -22410,7 +22410,7 @@ et et CONcoo MORPH=empty . . PONfrt MORPH=empty nuef nuef1 ADJcar NOMB.=p|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -chPoit022 +[REF:chPoit022] Ref. OUT MORPH=empty Sachent savoir VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=p toz tot PROind NOMB.=p|GENRE=m|CAS=n presens present1 ADJqua NOMB.=p|GENRE=m|CAS=n|DEGRE=p @@ -22854,7 +22854,7 @@ e et CONcoo MORPH=empty Guillame Guillaume NOMpro NOMB.=s|GENRE=m|CAS=n Pinea Pinea NOMpro NOMB.=s|GENRE=x|CAS=n . . PONfrt MORPH=empty -chPoit055 +[REF:chPoit055] Ref. OUT MORPH=empty Queneue conoistre VERppe NOMB.=s|GENRE=f|CAS=n chose chose NOMcom NOMB.=s|GENRE=f|CAS=n est estre1 VERcjg MODE=ind|TEMPS=pst|PERS.=3|NOMB.=s @@ -23613,7 +23613,7 @@ sexante soissante ADJcar NOMB.=p|GENRE=m|CAS=r e et CONcoo MORPH=empty treize treize ADJcar NOMB.=p|GENRE=m|CAS=r . . PONfrt MORPH=empty -chPoit060 +[REF:chPoit060] Ref. OUT MORPH=empty Sachent savoir VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=p tuit tot PROind NOMB.=p|GENRE=m|CAS=n presenz present1 ADJqua NOMB.=p|GENRE=m|CAS=n|DEGRE=p @@ -24163,7 +24163,7 @@ dit dire VERppe NOMB.=s|GENRE=m|CAS=n Regnaut Renaut NOMpro NOMB.=s|GENRE=m|CAS=n Soudeien Soudeien NOMpro NOMB.=s|GENRE=x|CAS=n par par PRE MORPH=empty -sey +sey soi1 PROper PERS.=3|NOMB.=s|GENRE=m|CAS=i , , PONfbl MORPH=empty un un DETndf NOMB.=s|GENRE=m|CAS=r sextier sestier NOMcom NOMB.=s|GENRE=m|CAS=r @@ -25136,7 +25136,7 @@ quatrevinz catre+vint ADJcar NOMB.=p|GENRE=m|CAS=r et et CONcoo MORPH=empty cinc cinc ADJcar NOMB.=p|GENRE=m|CAS=r . . PONfrt MORPH=empty -chPoit066 +[REF:chPoit066] Ref. OUT MORPH=empty Sachent savoir VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=p tuit tot PROind NOMB.=p|GENRE=m|CAS=n que que4 CONsub MORPH=empty @@ -26055,7 +26055,7 @@ quatrevinz catre+vint ADJcar NOMB.=p|GENRE=m|CAS=r e et CONcoo MORPH=empty quatorze catorze ADJcar NOMB.=p|GENRE=m|CAS=r . . PONfrt MORPH=empty -chPoit301 +[REF:chPoit301] Ref. OUT MORPH=empty À , PONfbl MORPH=empty toz tot DETind NOMB.=p|GENRE=m|CAS=r ceaus cel PROdem NOMB.=p|GENRE=m|CAS=r @@ -27236,7 +27236,7 @@ sexante soissante ADJcar NOMB.=p|GENRE=m|CAS=r e et CONcoo MORPH=empty quinze quinze ADJcar NOMB.=p|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHSL007 +[REF:CHSL007] Ref. OUT MORPH=empty Nos nos1 PROper PERS.=1|NOMB.=p|GENRE=m|CAS=n Girars Girard NOMpro NOMB.=s|GENRE=m|CAS=n sires seignor NOMcom NOMB.=s|GENRE=m|CAS=n @@ -27606,7 +27606,7 @@ sexante soissante ADJcar NOMB.=p|GENRE=m|CAS=r et et CONcoo MORPH=empty treze treize ADJcar NOMB.=p|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHSL029 +[REF:CHSL029] Ref. OUT MORPH=empty A a3 PRE MORPH=empty touz tot DETind NOMB.=p|GENRE=m|CAS=r ces cel PROdem NOMB.=p|GENRE=m|CAS=r @@ -28306,7 +28306,7 @@ et et CONcoo MORPH=empty neuf nuef1 ADJcar NOMB.=p|GENRE=m|CAS=r //. //. PONfbl MORPH=empty . . PONfrt MORPH=empty -CHSL079 +[REF:CHSL079] Ref. OUT MORPH=empty En en1 PRE MORPH=empty non nom NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty @@ -29798,7 +29798,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r dessus desus ADVgen DEGRE=- diz dire VERppe NOMB.=p|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -CHSL097 +[REF:CHSL097] Ref. OUT MORPH=empty Saichent savoir VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=p tout tot PROind NOMB.=p|GENRE=m|CAS=n presenz present1 ADJqua NOMB.=p|GENRE=m|CAS=n|DEGRE=p @@ -30943,7 +30943,7 @@ et et CONcoo MORPH=empty . . PONfrt MORPH=empty trente trente ADJcar NOMB.=p|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHSL098 +[REF:CHSL098] Ref. OUT MORPH=empty A a3 PRE MORPH=empty touz tot DETind NOMB.=p|GENRE=m|CAS=r ces cel PROdem NOMB.=p|GENRE=m|CAS=r @@ -31461,7 +31461,7 @@ mil mil1 ADJcar NOMB.=p|GENRE=m|CAS=r et et CONcoo MORPH=empty trante trente ADJcar NOMB.=p|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHSL126 +[REF:CHSL126] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n Jehanz Jean NOMpro NOMB.=s|GENRE=m|CAS=n damisés damoisel NOMcom NOMB.=s|GENRE=m|CAS=n @@ -31956,7 +31956,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty novambre novembre NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfbl MORPH=empty -CHSL127 +[REF:CHSL127] Ref. OUT MORPH=empty Saichent savoir VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=p tuit tot DETind NOMB.=p|GENRE=m|CAS=n cil cel PROdem NOMB.=p|GENRE=m|CAS=n @@ -32613,7 +32613,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty novambre novembre NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -CHV0019 +[REF:CHV0019] Ref. OUT MORPH=empty Nos nos1 PROper PERS.=1|NOMB.=p|GENRE=m|CAS=n , , PONfbl MORPH=empty Rogiers Roger NOMpro NOMB.=s|GENRE=m|CAS=n @@ -33259,7 +33259,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty mars marz NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -ChV0112 +[REF:ChV0112] Ref. OUT MORPH=empty Serenissimo ETR OUT MORPH=empty ac ETR OUT MORPH=empty superexcellenti ETR OUT MORPH=empty @@ -34082,7 +34082,7 @@ ducentesimo ETR OUT MORPH=empty sexagesimo ETR OUT MORPH=empty sexto ETR OUT MORPH=empty . ETR OUT MORPH=empty -ChV0124 +[REF:ChV0124] Ref. OUT MORPH=empty Counue conoistre VERppe NOMB.=s|GENRE=f|CAS=n choze chose NOMcom NOMB.=s|GENRE=f|CAS=n soit estre1 VERcjg MODE=sub|TEMPS=pst|PERS.=3|NOMB.=s @@ -34799,7 +34799,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty may mai NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -ChV0127 +[REF:ChV0127] Ref. OUT MORPH=empty Nos nos1 PROper PERS.=1|NOMB.=p|GENRE=m|CAS=n , , PONfbl MORPH=empty Ferris Ferri NOMpro NOMB.=s|GENRE=m|CAS=n @@ -35738,7 +35738,7 @@ moys mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty mai mai NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -ChV0144 +[REF:ChV0144] Ref. OUT MORPH=empty Je je PROper PERS.=1|NOMB.=s|GENRE=m|CAS=n , , PONfbl MORPH=empty Perres Pierre NOMpro NOMB.=s|GENRE=m|CAS=n @@ -36366,7 +36366,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty fevrier fevrier NOMcom NOMB.=s|GENRE=m|CAS=r . . PONfrt MORPH=empty -R_1268_12_32_01 +[REF:R_1268_12_32_01] Ref. OUT MORPH=empty Looys Louis NOMpro NOMB.=s|GENRE=m|CAS=n par par PRE MORPH=empty la le DETdef NOMB.=s|GENRE=f|CAS=r @@ -37296,7 +37296,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty decembre decembre NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -R_1299_03_26_01 +[REF:R_1299_03_26_01] Ref. OUT MORPH=empty Phelippes Philippe NOMpro NOMB.=s|GENRE=m|CAS=n par par PRE MORPH=empty la le DETdef NOMB.=s|GENRE=f|CAS=r @@ -39444,7 +39444,7 @@ Dame dame NOMpro NOMB.=s|GENRE=f|CAS=r Vierge virge ADJqua NOMB.=s|GENRE=f|CAS=r|DEGRE=p .//. .//. PONfrt MORPH=empty /. . PONfbl MORPH=empty -RC_1284_05_17_01 +[REF:RC_1284_05_17_01] Ref. OUT MORPH=empty Phelippes Philippe NOMpro NOMB.=s|GENRE=m|CAS=n par par PRE MORPH=empty la le DETdef NOMB.=s|GENRE=f|CAS=r @@ -40982,7 +40982,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty may mai NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -RM_1285_03_32_01 +[REF:RM_1285_03_32_01] Ref. OUT MORPH=empty En en1 PRE MORPH=empty non nom NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty @@ -43306,7 +43306,7 @@ mois mois2 NOMcom NOMB.=s|GENRE=m|CAS=r de de PRE MORPH=empty marz marz NOMcom NOMB.=s|GENRE=m|CAS=r .//. .//. PONfrt MORPH=empty -RP_1297_03_32_01 +[REF:RP_1297_03_32_01] Ref. OUT MORPH=empty Phelippes Philippe NOMpro NOMB.=s|GENRE=m|CAS=n //. //. PONfbl MORPH=empty par par PRE MORPH=empty diff --git a/tsv/Geste_aspremont-fr-25529.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_aspremont-fr-25529.tsv similarity index 100% rename from tsv/Geste_aspremont-fr-25529.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_aspremont-fr-25529.tsv diff --git a/tsv/Geste_ed_FloovG.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_FloovG.tsv similarity index 100% rename from tsv/Geste_ed_FloovG.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_FloovG.tsv diff --git a/tsv/Geste_ed_FlorenceA.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_FlorenceA.tsv similarity index 100% rename from tsv/Geste_ed_FlorenceA.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_FlorenceA.tsv diff --git a/tsv/Geste_ed_GarLorrC.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrC.tsv similarity index 100% rename from tsv/Geste_ed_GarLorrC.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrC.tsv diff --git a/tsv/Geste_ed_GarLorrDr.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrDr.tsv similarity index 100% rename from tsv/Geste_ed_GarLorrDr.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrDr.tsv diff --git a/tsv/Geste_ed_GarLorrMe1a.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrMe1a.tsv similarity index 100% rename from tsv/Geste_ed_GarLorrMe1a.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrMe1a.tsv diff --git a/tsv/Geste_ed_GarLorrMe1b.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrMe1b.tsv similarity index 100% rename from tsv/Geste_ed_GarLorrMe1b.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrMe1b.tsv diff --git a/tsv/Geste_ed_GarLorrMe2.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrMe2.tsv similarity index 100% rename from tsv/Geste_ed_GarLorrMe2.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrMe2.tsv diff --git a/tsv/Geste_ed_GarLorrMo.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrMo.tsv similarity index 100% rename from tsv/Geste_ed_GarLorrMo.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrMo.tsv diff --git a/tsv/Geste_ed_GarLorrPa.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrPa.tsv similarity index 100% rename from tsv/Geste_ed_GarLorrPa.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_GarLorrPa.tsv diff --git a/tsv/Geste_ed_GerbMetzMe1.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_GerbMetzMe1.tsv similarity index 100% rename from tsv/Geste_ed_GerbMetzMe1.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_GerbMetzMe1.tsv diff --git a/tsv/Geste_ed_GerbMetzMe2.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_GerbMetzMe2.tsv similarity index 100% rename from tsv/Geste_ed_GerbMetzMe2.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_GerbMetzMe2.tsv diff --git a/tsv/Geste_ed_GirVianeM.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_GirVianeM.tsv similarity index 100% rename from tsv/Geste_ed_GirVianeM.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_GirVianeM.tsv diff --git a/tsv/Geste_ed_GuiBourgG.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_GuiBourgG.tsv similarity index 100% rename from tsv/Geste_ed_GuiBourgG.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_GuiBourgG.tsv diff --git a/tsv/Geste_ed_HervisP.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_HervisP.tsv similarity index 100% rename from tsv/Geste_ed_HervisP.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_HervisP.tsv diff --git a/tsv/Geste_ed_MacaireAl2B.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_MacaireAl2B.tsv similarity index 100% rename from tsv/Geste_ed_MacaireAl2B.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_MacaireAl2B.tsv diff --git a/tsv/Geste_ed_MacaireAl3T.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_ed_MacaireAl3T.tsv similarity index 100% rename from tsv/Geste_ed_MacaireAl3T.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_ed_MacaireAl3T.tsv diff --git a/tsv/Geste_transcr_Asprem_C.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_transcr_Asprem_C.tsv similarity index 100% rename from tsv/Geste_transcr_Asprem_C.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_transcr_Asprem_C.tsv diff --git a/tsv/Geste_transcr_Asprem_P4.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_transcr_Asprem_P4.tsv similarity index 100% rename from tsv/Geste_transcr_Asprem_P4.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_transcr_Asprem_P4.tsv diff --git a/tsv/Geste_transcr_GarLorr_X.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_transcr_GarLorr_X.tsv similarity index 100% rename from tsv/Geste_transcr_GarLorr_X.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_transcr_GarLorr_X.tsv diff --git a/tsv/Geste_transcr_Otin_A.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_transcr_Otin_A.tsv similarity index 100% rename from tsv/Geste_transcr_Otin_A.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_transcr_Otin_A.tsv diff --git a/tsv/Geste_transcr_Otin_B.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_transcr_Otin_B.tsv similarity index 100% rename from tsv/Geste_transcr_Otin_B.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_transcr_Otin_B.tsv diff --git a/tsv/Geste_transcr_Otin_M.tsv b/tsv/LemmaPosMorph/PONfrt/Geste_transcr_Otin_M.tsv similarity index 100% rename from tsv/Geste_transcr_Otin_M.tsv rename to tsv/LemmaPosMorph/PONfrt/Geste_transcr_Otin_M.tsv diff --git a/tsv/Varia_chroniques-calais.tsv b/tsv/LemmaPosMorph/PONfrt/Varia_chroniques-calais.tsv similarity index 100% rename from tsv/Varia_chroniques-calais.tsv rename to tsv/LemmaPosMorph/PONfrt/Varia_chroniques-calais.tsv diff --git a/tsv/Varia_grande-chirurgie-3.tsv b/tsv/LemmaPosMorph/PONfrt/Varia_grande-chirurgie-3.tsv similarity index 100% rename from tsv/Varia_grande-chirurgie-3.tsv rename to tsv/LemmaPosMorph/PONfrt/Varia_grande-chirurgie-3.tsv diff --git a/tsv/Varia_grande-chirurgie-guy-de-chauliac.tsv b/tsv/LemmaPosMorph/PONfrt/Varia_grande-chirurgie-guy-de-chauliac.tsv similarity index 100% rename from tsv/Varia_grande-chirurgie-guy-de-chauliac.tsv rename to tsv/LemmaPosMorph/PONfrt/Varia_grande-chirurgie-guy-de-chauliac.tsv diff --git a/tsv/Varia_grande-chirurgie-meynaud.tsv b/tsv/LemmaPosMorph/PONfrt/Varia_grande-chirurgie-meynaud.tsv similarity index 100% rename from tsv/Varia_grande-chirurgie-meynaud.tsv rename to tsv/LemmaPosMorph/PONfrt/Varia_grande-chirurgie-meynaud.tsv diff --git a/tsv/WauchierSConf_jns915.jns1742.ciham-lemTEI.tsv b/tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1742.ciham-lemTEI.tsv similarity index 100% rename from tsv/WauchierSConf_jns915.jns1742.ciham-lemTEI.tsv rename to tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1742.ciham-lemTEI.tsv diff --git a/tsv/WauchierSConf_jns915.jns1743.ciham-lemTEI.tsv b/tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1743.ciham-lemTEI.tsv similarity index 100% rename from tsv/WauchierSConf_jns915.jns1743.ciham-lemTEI.tsv rename to tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1743.ciham-lemTEI.tsv diff --git a/tsv/WauchierSConf_jns915.jns1744.ciham-lemTEI.tsv b/tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1744.ciham-lemTEI.tsv similarity index 100% rename from tsv/WauchierSConf_jns915.jns1744.ciham-lemTEI.tsv rename to tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1744.ciham-lemTEI.tsv diff --git a/tsv/WauchierSConf_jns915.jns1761.ciham-lemTEI.tsv b/tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1761.ciham-lemTEI.tsv similarity index 100% rename from tsv/WauchierSConf_jns915.jns1761.ciham-lemTEI.tsv rename to tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1761.ciham-lemTEI.tsv diff --git a/tsv/WauchierSConf_jns915.jns1856.ciham-lemTEI.tsv b/tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1856.ciham-lemTEI.tsv similarity index 100% rename from tsv/WauchierSConf_jns915.jns1856.ciham-lemTEI.tsv rename to tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1856.ciham-lemTEI.tsv diff --git a/tsv/WauchierSConf_jns915.jns1994.ciham-lemTEI.tsv b/tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1994.ciham-lemTEI.tsv similarity index 100% rename from tsv/WauchierSConf_jns915.jns1994.ciham-lemTEI.tsv rename to tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns1994.ciham-lemTEI.tsv diff --git a/tsv/WauchierSConf_jns915.jns2000.ciham-lemTEI.tsv b/tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns2000.ciham-lemTEI.tsv similarity index 100% rename from tsv/WauchierSConf_jns915.jns2000.ciham-lemTEI.tsv rename to tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns2000.ciham-lemTEI.tsv diff --git a/tsv/WauchierSConf_jns915.jns2114.ciham-lemTEI.tsv b/tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns2114.ciham-lemTEI.tsv similarity index 100% rename from tsv/WauchierSConf_jns915.jns2114.ciham-lemTEI.tsv rename to tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns2114.ciham-lemTEI.tsv diff --git a/tsv/WauchierSConf_jns915.jns2117.ciham-lemTEI.tsv b/tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns2117.ciham-lemTEI.tsv similarity index 100% rename from tsv/WauchierSConf_jns915.jns2117.ciham-lemTEI.tsv rename to tsv/LemmaPosMorph/PONfrt/WauchierSConf_jns915.jns2117.ciham-lemTEI.tsv diff --git a/tsv/digulleville-pelerinage-de-l-ame.tsv b/tsv/LemmaPosMorph/PONfrt/digulleville-pelerinage-de-l-ame.tsv similarity index 100% rename from tsv/digulleville-pelerinage-de-l-ame.tsv rename to tsv/LemmaPosMorph/PONfrt/digulleville-pelerinage-de-l-ame.tsv diff --git a/tsv/roman-de-la-rose-8227-10024.tsv b/tsv/LemmaPosMorph/PONfrt/roman-de-la-rose-8227-10024.tsv similarity index 100% rename from tsv/roman-de-la-rose-8227-10024.tsv rename to tsv/LemmaPosMorph/PONfrt/roman-de-la-rose-8227-10024.tsv diff --git a/tsv/rutebeuf-charlot.tsv b/tsv/LemmaPosMorph/PONfrt/rutebeuf-charlot.tsv similarity index 100% rename from tsv/rutebeuf-charlot.tsv rename to tsv/LemmaPosMorph/PONfrt/rutebeuf-charlot.tsv diff --git a/tsv/rutebeuf-theophile-2.tsv b/tsv/LemmaPosMorph/PONfrt/rutebeuf-theophile-2.tsv similarity index 100% rename from tsv/rutebeuf-theophile-2.tsv rename to tsv/LemmaPosMorph/PONfrt/rutebeuf-theophile-2.tsv diff --git a/tsv/trouveres-firstsample.tsv b/tsv/LemmaPosMorph/PONfrt/trouveres-firstsample.tsv similarity index 100% rename from tsv/trouveres-firstsample.tsv rename to tsv/LemmaPosMorph/PONfrt/trouveres-firstsample.tsv diff --git a/tsv/README.md b/tsv/README.md new file mode 100644 index 0000000..1e42cfe --- /dev/null +++ b/tsv/README.md @@ -0,0 +1,11 @@ +Data organization +================= + +The folder structure informs about the content of the data + +| Path | Description | +| ---- | ----------- | +| /LemmaPos | Contains data having only Lemma and POS gold annotations | +| /LemmaPosMorph | Contains data having Lemma, POS and Morph gold annotations | +| /LemmaPosMorph/EmptyLine | These data have samples split by empty lines instead of punctuation | +| /LemmaPosMorph/PONfrt | These data have samples split by the POS `PONfrt` or Lemma `Ref.` |