Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add future lib and port src/base #6067

Merged
merged 8 commits into from
Jul 6, 2018

Conversation

Eric-Arellano
Copy link
Contributor

Description

Implements stage 2 of porting plan, which is adding the future library.

Also ports the src/pants/base and tests/python/pants_test/base as a demo of how this tool works.

At this stage, we aren't testing things yet with the Python 3 interpreter, even though technically it's Python 3 code. The goal for now is to make sure things still work on Python 2.

FYI - how the backports work

The future library adds several modules: future, builtins, past, and individual ports like queue (instead of Queue).

These added names all correspond exactly to Python 3 code. When running Python 3, it will simply use the native Python 3 version. When using Python 2, it will use the library implementation, which is semantically equivalent to Python 3.

FYI - using old moves interface

Typically futurize will try to use this idiom for certain backports

from future import standard_library
standard_library.install_aliases()

import [backports]

This however does not work in our codebase, because all imports must appear above code with iSort, so the install_aliases() line gets moved below the relevant imports.

Instead, we are using the legacy future.moves library. Once we remove Python 2 support we'll want to convert this type of import from future.moves.itertools import x to from itertools import x.

Testing

I ran the tests for everything with the normal Python 2 interpreter - is that enough, or should I hand test any behavior?

Open question - remaining pull requests

The next stage is to run this tool on all remaining source code, which will lead to a huge pull request. How should I split this up? Do you prefer everything being together, one folder at a time, or something else?

I'm using this spreadsheet to track progress.

All of these changes will be generated by futurize, unless otherwise noted where I have to manually fix things. I'm restraining myself from refactors etc to minimize complexity.

FYI - script to automate running tool

I made this helper script to automate the 5 steps it takes to port a file with futurize. It's not very robust but you can use to experiment with this tool.



@total_ordering
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implements all 6 comparison dunders, given __eq__ and __lt__.

There is a slight performance penalty. If Revision is used a lot, I can manually add the other comparisons.


def __ne__(self, other):
return not self.__eq__(other)
if not self._is_valid_operand(other):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This slightly changes the behavior from before. When comparing to non-Revisions, this returns NotImplemented, which is the expected behavior. instead of the prior False.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any python3 compat reason to mix in this behavior change? As an example of future conversion PRs, it would be good to stick to pure semantic-preserving automated conversion if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I made the change is for the@total_ordering decorator, so that we treat __eq__ and __lt__ more similarly. I believe I can revert back to original semantics if you prefer, though.

The only Py3 change we need is replacing __cmp__ with __lt__ or one of the other rich comparison functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer it if just to keep semantics changes out of the conversion PRs. Especially since Revision is :API: public and the semantic shift is False -> raise. We can't know how this will break API users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@Eric-Arellano Eric-Arellano Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted __eq__ to return False

But am keeping __lt__ as returning NotImplemented. The prior implementation in __cmp__ would return AttributeError if other._components is not defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I think I can buy the AttributeError case as not being part of the public api, so this change is probably OK. That said - I'd avoid that sort of improvement in the conversion PRs in particular going forward. Broken out as separate changes is a different story, that is definitely great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and the semantic shift is False -> raise.

I was mistaken there. The semantic shift is subtle to non-existant depending on how other implements __eq__. That said, still a fan of nixing improvements in these PRs. TODO is ok, but it really should be attributed and / or - better - assigned an issue with the issue link noted in the TODO. Your spirit of wanting to fix is admirable and I'm sure you'll see more of these. Perhaps just start 1 umbrella issue to capture all the small TODO's you find then you or someone else can circle back and do the fixes in unrelated PR's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I reverted everything back to identical semantics from before (just more explicit).

I think it's a good policy to avoid any changes in these PRs because they get buried. Instead, will add notes in form TODO(python3port) and open an issue at #6071.


def __ne__(self, other):
return not self.__eq__(other)
if not self._is_valid_operand(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any python3 compat reason to mix in this behavior change? As an example of future conversion PRs, it would be good to stick to pure semantic-preserving automated conversion if possible.

@@ -34,7 +35,7 @@ def convert_val(x):
return [convert_val(e) for e in x]
else:
return x
items = [(key, convert_val(val)) for (key, val) in args.items()]
items = [(key, convert_val(val)) for (key, val) in list(args.items())]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these are automated, but they are unfortunate. An Iterable is enough to build a list comprehension and so wrapping in a list is a waste of resources and extra un-needed code noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We can manually try to identify places like this if you think it's worth it, although will slow us down and may be error prone knowing when and when not to change back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading over this guide and I think we should actually get rid of this call to list() when it's clear it's not necessary - it leads to less performant code in both languages.

If it's unclear if I can safely remove it, I'll leave it for us to review in PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can definitely safely remove it in this context - ie: immediate consumption of the Iterable. That said, I loathe scaning for these in the conversion PRs - does not scale and we'll likely mess up. It would be awesome if the tool was better or configurable to avoid this case. Presumably it uses an AST and not just string searches. If it does use an AST, a super-nice contribution would be expanding about the dict.{items,keys,values} call to see if the context is the generator of a for loop, and if so - do not wrap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsirois I've been reviewing the changes already - not that much extra work for me to remove list() when it's used in for loops. Will make your lives easier with PRs also - less changes.

Maintainers stopped updating futurize, 2to3, etc last year, so PR wouldn't go far there.

Copy link
Contributor

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending green ci

@@ -103,7 +105,7 @@ def error(e):

# We filter out Nones defensively. There shouldn't be any, but if a bug causes one,
# Pants might hang indefinitely without this filtering.
work_iter = iter(filter(None, work_chain))
work_iter = iter([_f for _f in work_chain if _f])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be just work_iter = (_f for _f in work_chain if _f); ie directly construct an iter (generator) vs create a list and then ask it for an iter. If this change was automated however, I'm fine with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix it. If we don't change it now, it will probably never get changed and might be confusing to later devs why we're using this roundabout approach.

I do agree with you that we should be more rigorous though with keeping 100% semantic parity (as possible). But for small things like removing extraneous list(), I think it's worth fixing what futurize generates.


def __ne__(self, other):
return not self.__eq__(other)
if not self._is_valid_operand(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and the semantic shift is False -> raise.

I was mistaken there. The semantic shift is subtle to non-existant depending on how other implements __eq__. That said, still a fan of nixing improvements in these PRs. TODO is ok, but it really should be attributed and / or - better - assigned an issue with the issue link noted in the TODO. Your spirit of wanting to fix is admirable and I'm sure you'll see more of these. Perhaps just start 1 umbrella issue to capture all the small TODO's you find then you or someone else can circle back and do the fixes in unrelated PR's.

@@ -354,7 +355,9 @@ def generate_generated(config, here):

def render_html(dst, config, soups, precomputed, template):
soup = soups[dst]
renderer = Renderer()
renderer = Renderer(escape=cgi.escape) # TODO(python3port): cgi.escape is deprecated in favor html.escape
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a tough bug to find. Pystache by default uses cgi.escape() in Python2, but html.escape() in Python3. Because future adds a backport to html (which doesn't exist in Python2), Pystache successfully was importing the module and using html.escape().

html.escape() escapes ' unlike cgi.escape(), so one of our tests was failing.

For now, I'm having us stay with the old semantics.

@Eric-Arellano
Copy link
Contributor Author

cc @mateor @iantabolt @stuhood @kwlzn @benjyw

How do you prefer I proceed with future porting pull requests? @jsirois and I worked out some good practices for what should and should not be included in these PRs, but we need to decide how big we want them to be.

Should I submit a module at a time, or all at once, or something else?

@mateor
Copy link
Member

mateor commented Jul 6, 2018

=John on breaking the improvements as distinct from py3-required changes.

A benefit to doing that is that it makes the more procedural commits easier to review.

I think that testing in a single module makes sense for a first pass. Personally, I might try doing the entire codebase locally, and think about breaking out and landing required logic changes, then landing the remainder as a blob.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@stuhood stuhood merged commit f783edc into pantsbuild:master Jul 6, 2018
@stuhood
Copy link
Member

stuhood commented Jul 6, 2018

Module at a time would definitely be preferable to doing the whole repository at once. If this were something like a simple style change, that would be one thing. But it seems quite likely that there will be actual bugs caused/uncovered by this, and smaller PRs will make bisecting feasible.

Keeping behaviour changes isolated into separate PRs would be ideal. Thanks!

@Eric-Arellano Eric-Arellano deleted the add-future-lib branch July 6, 2018 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants