Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[de]German wiktionary's conjugation page issues #974

Closed
yhyh0 opened this issue Jan 7, 2025 · 9 comments
Closed

[de]German wiktionary's conjugation page issues #974

yhyh0 opened this issue Jan 7, 2025 · 9 comments

Comments

@yhyh0
Copy link

yhyh0 commented Jan 7, 2025

Upon looking through the JSON data for each German word, I have found the following issues:

  1. Some section names are not being accounted for when compiling the tags, leading to an excessive number of tags (ranging from 400 to 1400). Consequently, it is difficult to distinguish between different sections based on the tags alone. The section names that are being overlooked usually include 'Zustandsreflexiv', 'reflexiv', 'unregelmäßig', 'haben/sein', and 'untrennba', among others.
    Some example words:
    abachen - Zustandsreflexiv
    anpampfen - reflexiv
    abweichen - unregelmäßig
    untrennbar (Deutsch)

  2. When a word has multiple entries, each linking to a different section of the same conjugation page, the form entries in the JSONs' forms section are duplicated (though they are not entirely identical, with one or two forms varying in the list). It would be beneficial to eliminate this duplication. However, I suspect that addressing this issue is contingent upon first resolving issue 1 mentioned above.

I can provide additional details if that would be beneficial. Thank you for your work in maintaining this library.

@xxyzz
Copy link
Collaborator

xxyzz commented Jan 7, 2025

Maybe the Deutsch Verb schwach trennbar reflexiv template is not handled properly in flexion.py, I could take a look tomorrow.

@xxyzz
Copy link
Collaborator

xxyzz commented Jan 8, 2025

"untrennbar" and "unregelmäßig" tags in level 2 node are added in #980. "Zustandsreflexiv" and "reflexiv" are also in the table header so I didn't add them.

@yhyh0
Copy link
Author

yhyh0 commented Jan 9, 2025

"untrennbar" and "unregelmäßig" tags in level 2 node are added in #980. "Zustandsreflexiv" and "reflexiv" are also in the table header so I didn't add them.

Thank you, xxyzz.

@yhyh0 yhyh0 closed this as completed Jan 9, 2025
@yhyh0 yhyh0 reopened this Feb 23, 2025
@yhyh0
Copy link
Author

yhyh0 commented Feb 23, 2025

"untrennbar" and "unregelmäßig" tags in level 2 node are added in #980. "Zustandsreflexiv" and "reflexiv" are also in the table header so I didn't add them.

It seems like the section name still not captured, I just checked abachen with data "dewiktionary dump dated 2025-02-21 using wiktextract (9e2b7d3 and f2e72e5)".
The first form item of the "Zustandsreflexiv" section looks like this:

{form: 'ich bin abgeacht', source: 'Flexion:abachen', tags: ['first-person', 'singular', 'present', 'indicative']}

The tags should include something indicating its in section(subsection) "Zustandsreflexiv".

@yhyh0
Copy link
Author

yhyh0 commented Feb 23, 2025

I checked more words.
aussaugen -- works now, as it now captures tag "irregular".
verhauen -- still not working, as it is still not capturing the section name "reflexiv".

@xxyzz
Copy link
Collaborator

xxyzz commented Feb 24, 2025

Section titles are added in #1047, some are not translated and added to the "raw_tags" list.

@yhyh0
Copy link
Author

yhyh0 commented Feb 24, 2025

Section titles are added in #1037, some are not translated and added to the "raw_tags" list.

Thank you, xxyzz. Thanks for your quick update.

While testing it, I also find that it might be better to also include 'hilfsverb-haben', 'hilfsverb-sein', and 'trennbar' in the tags.

One example is the word abbiegen, which includes sections for 'haben' and 'sein'.
And having 'trennbar' is just that it is not always easy to separate different sections with hundreds of form items basing on only the tag 'untrennbar' and the order of the items. Simply setting the rest of items as 'trennbar' might not be always correct. The same might also apply to the tag 'regular'/'irregular'.

I was trying to make the change, but I'm sure you would have a better version of it.

@xxyzz
Copy link
Collaborator

xxyzz commented Feb 25, 2025

Tags are added in #1049. "Hilfsverb haben" and "Hilfsverb sein" are added to the "raw_tags" list, "trennbar" is translated to "separable" and added to "tags" list.

@yhyh0
Copy link
Author

yhyh0 commented Feb 25, 2025

Thank you! That's really helpful.

@yhyh0 yhyh0 closed this as completed Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants