Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typing stuff #446

Merged
merged 8 commits into from
Jan 31, 2024
Merged

Typing stuff #446

merged 8 commits into from
Jan 31, 2024

Conversation

kristian-clausal
Copy link
Collaborator

Typing stuff, part two, this time so it's possible to rebase (memo to self: don't merge to update, rebase this branch as you go).

I already found a bug. Is there anything worse than looking at some code, seeing that it's wrong, and then finding out it was your own commit that did it?

@kristian-clausal
Copy link
Collaborator Author

The point of type-checking all of this is to type-check things, and it feels silly to have to add # type:ignore for the outgoing data. We "know" what the data should be like, so we should be able to type-set it.

I will attempt to do some typing for the data. It should be possible, using TypedDict. TypedDict can (in 3.9) have optional and required fields (which are created by basically inheriting). We'll see if I can make it work out, or if I can just get something to work partially.

@xxyzz
Copy link
Collaborator

xxyzz commented Jan 3, 2024

The proper way to enforce data structure is using pydantic, it would take more efforts than create types. But define typeddict is almost like define pydantic models.

And the first two commits are merge commits...

@xxyzz
Copy link
Collaborator

xxyzz commented Jan 4, 2024

You could try python -m ruff format to format code, I have removed black from dev dependency in the wikitextprocessor package but haven't remove it in this repo.

@kristian-clausal
Copy link
Collaborator Author

kristian-clausal commented Jan 4, 2024

wtf, those merges don't make sense!

This isn't about data structure, this is about type-checking and commenting the code.

@xxyzz
Copy link
Collaborator

xxyzz commented Jan 4, 2024

IMO add typeddict don't have much benefits compares to pydantic models. Good luck wrestle with mypy...

I should say this in another way, I anticipate define typeddict types for JSON data is as difficult as using pydantic models. Maybe even harder for passing mypy checks.

@kristian-clausal
Copy link
Collaborator Author

kristian-clausal commented Jan 4, 2024

I couldn't figure out a way to get rid of those merges except merge master to origin/master, push that and then rebase the branch to that (which fixed it).

This is why I've been avoiding rebase, it is such a mess. Merges may look messy, but they're clear and simple, while rebases, every time I've touched them, mess something up. Not a fan.

@xxyzz
Copy link
Collaborator

xxyzz commented Jan 4, 2024

That's not git rebase's fault. I suspect your "typing" branch is not created correctly. And this pr doesn't even need rebase...

@kristian-clausal
Copy link
Collaborator Author

If I can't use a tool properly, then I shouldn't be using that tool! If it makes you any happier, I can just switch back to using merge.

@xxyzz
Copy link
Collaborator

xxyzz commented Jan 4, 2024

I'm trying to encourage you to try the rebase feature to avoid the situation like the last pr, I'm not even block that pr...

Maybe I didn't make my self clear, I just suggest you to rebase to update an outdated pr branch, you could use merge if you don't like rebase.

This code used ns_title_prefix_tuple to get a list of
aliases for 'Thesaurus', but then use a constant w[10:]
to remove "Thesaurus" only from the start of `w`.

This might never have actually triggered, if there
are now aliases for "Thesaurus", though!

The more correct way to fix this is to iterate over
the return from ns_title_prefix_tuple, but a less
correct (but probably also correct) would be to just
(revert back to?) just checking w.startswith("Thesaurus")...
Is there anything wrong than seeing a piece of code, thinking to yourself
"that's not right, that can't work?", searching for a keyword in there to
figure out what the original commit was that introduced this code, and then
seeing your own name on that commit?

Yeah, this piece of code of has a small error that basically invalidated the
whole purpose of the changes made. Happily, the error was to let through
everything, which is still *ok*, but the desired result was the filter
certain things.

The error was: I created a function to remove "pos" fields from "sounds"
data when adding that data to a list. The first part of the block was about
filtering out "sounds" data that didn't match the word or the forms of the
word we were being processed, and the *second* part of the block was about
filtering for wrong "pos" data and then removing the "pos" sections.. But I
used the pos-removing function in the first part of the block, which meant
there were no pos-sections to compare in the second. Because there was
not pos-data, all the sounds were let through.
Currently these are TypedDicts with `total=False`, but
it is possible in Python 3.9 to have TypedDicts with
mixed required and optional fields by using inheritance;
one parent TypedDict has the required fields (`total=True`),
the other doesn't, and the child TypedDict class has
mixed requirements. This can be done later.
Sorry if it's messy, had to do some temp commits and
I don't want to touch this anymore after amending this,
just in case.
This code has been unreachable for a couple of years,
with an XXX comment about removing it; it's pretty hard to
figure out if the changes Tatu was meaning to make actually
were made, but seeing as how things have been running smoothly
even when parse_linkage_template has been unreachable all
this time, I opted to just remove the unused code.

I've also commented out some data used by parse_linkage_template
in linkage_template_mappings (a list of lists that should have
been a dict?), and moved the linkage template names into a new
template_linkages set to be used in the one place that still
had a reference to it.
@kristian-clausal kristian-clausal marked this pull request as ready for review January 31, 2024 11:01
@kristian-clausal
Copy link
Collaborator Author

This is getting a bit too long, and most of the changes do not actually affect any behavior so I'll be merging this and will make a new branch in the future.

@kristian-clausal kristian-clausal merged commit bc9225b into master Jan 31, 2024
10 checks passed
@kristian-clausal kristian-clausal deleted the typing branch January 31, 2024 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants