Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract linkages from Spanish Wiktionary #420

Merged
merged 1 commit into from
Dec 6, 2023
Merged

Conversation

empiriker
Copy link
Contributor

This is as far as I planned to develop the Spanish extractor.

Apart from places indicated with # XXX: ..., the biggest area of improvement would be perhaps to deal with form sections. In the Spanish Wiktionary, whenever a lemma can also be a conjugated/declinated form of a different word, this will be noted in a form section which are siblings to POS sections.

To get an order of magnitude, in the 20,000 pages I am developing on, I reach these form sections ca. 15,000 times. An example, for the word pueblo:

image

For me it's okay to not deal with these right now. But perhaps someone else will.

This work is a contribution to the EWOK project, which receives funding from LABEX ASLAN (ANR–10–LABX–0081) at the Université de Lyon, as part of the "Investissements d'Avenir" program initiated and overseen by the Agence Nationale de la Recherche (ANR) in France.
@xxyzz xxyzz merged commit ecb5e11 into tatuylonen:master Dec 6, 2023
5 checks passed
@xxyzz
Copy link
Collaborator

xxyzz commented Dec 6, 2023

Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants