Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pydantic and json_schema to German extractor #418

Merged
merged 1 commit into from
Dec 6, 2023

Conversation

empiriker
Copy link
Contributor

This is to add pydantic classes to the German extractor.

In some cases, pydantic fields are defined as lists of strings where probably they could be just a string. I decided to accept for now this looser schema to avoid refactoring even more. This was already quite a bit more painful than I anticipated.

This work is a contribution to the EWOK project, which receives funding from LABEX ASLAN (ANR–10–LABX–0081) at the Université de Lyon, as part of the "Investissements d'Avenir" program initiated and overseen by the Agence Nationale de la Recherche (ANR) in France.
@xxyzz xxyzz merged commit 1e9c5d1 into tatuylonen:master Dec 6, 2023
5 checks passed
@xxyzz
Copy link
Collaborator

xxyzz commented Dec 6, 2023

Thanks for your contribution!

How's the performance compare to the previous code?

@empiriker
Copy link
Contributor Author

Without pydantic:

2023-12-06 10:12:39,031 INFO:   ... 20000/20000 pages (100.0%) processed, 00:00:00 remaining
2023-12-06 10:12:39,048 INFO: Reprocessing wiktionary complete
         776069 function calls (776037 primitive calls) in 98.246 seconds

With pydantic:

2023-12-06 10:08:30,500 INFO:   ... 20000/20000 pages (100.0%) processed, 00:00:00 remaining
2023-12-06 10:08:30,515 INFO: Reprocessing wiktionary complete
         577305 function calls (577273 primitive calls) in 98.502 seconds

These are the processing times of running each version one time on my local machine. Apparently pydantic comes at a cost of 0.26% of performance.

Obviously a more rigorous test setup might give more insights but to me this seems encouraging enough. What do you think?

@xxyzz
Copy link
Collaborator

xxyzz commented Dec 6, 2023

That's pretty insignificant. We probably don't need to worry about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants