Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port majority of pants/util #6073

Merged
merged 7 commits into from
Jul 9, 2018
Merged

Conversation

Eric-Arellano
Copy link
Contributor

Ports everything util/ except for objects.py and meta.py (see #6072), process_handler.py, and tarutil.py.

Part of #6062

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -14,10 +14,11 @@
import time
import uuid
import zipfile
from builtins import object
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporarily removing the override of str() because it breaks a bunch of tests. I'll figure out how to fix this in a separate dedicated PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually for now I'm going keep the original semantics. I don't understand the ramifications to changing this.

We might run into issues when running Python3 tests the first time - if so, we'll know we have to change the semantics from what we have now.

Below change (line 159) makes it so the semantics is the same in Py2 and Py3.

@@ -6,6 +6,7 @@
unicode_literals, with_statement)

import threading
from builtins import object, str
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjyw can you look over this file please?

This override means that in line 39, during call to _RWBuf.write(), it will call _RWBuf.do_write() with unicode, rather than bytes, which might be a problem. However, do_write() simply raises NotImplementedError, so I don't know if this actually matters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, from builtins import str is a no-op in py3 but renames py2's unicode type to str in py2? I think this definitely will have consequences. A low-level IO method should always operate on raw bytes (str in py2, bytes in py3), never on encoded strings.

However this file only explicitly uses str in one line, and that's hacky:

self.do_write(str(s))

If s is already binary data, no problem, but if it's text, then in python2 that str(s) would attempt to encode it using the default encoding, which is not guaranteed to work.

So really write() should check if the argument is bytes, and throw an exception of not. That way we'll find out eagerly if any call sites do the wrong thing.

Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine something like from builtins import bytes and then in write() check for isinstance(s, bytes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. from builtins import str modifies Python2's str() function so that it gets a unicode, rather than bytes, representation.

Agreed with that change. I'll break it out into a separate PR since it's a substantial semantic change + we have no explicit tests for this file.

Copy link
Contributor

@benjyw benjyw Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually what I'm proposing involves no semantic change. This commit, as stands, would be a major semantic change, as it would cast arbitrary bytes into text, which might explode. So you can't merge this as-is.

@@ -34,8 +35,10 @@ def read_from(self, pos, size=-1):
return self._io.read() if size == -1 else self._io.read(size)

def write(self, s):
if not isinstance(s, bytes):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjyw follow up from prior discussion. Good point about prior version changing semantics.

This does change semantics that now ValueError will be raised whereas earlier we would try to coerce into bytes. If it's safe to make this change, I agree with you this is better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but that coercion might have failed, so it's best to at least fail predictably and eagerly.

stuhood pushed a commit that referenced this pull request Jul 9, 2018
Followup from #6073 port of `util`. These required manual changes. `util` is all ported after this.
Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rwbuf.py LGTM mod my comment about making the error message more informative.

@@ -34,8 +35,10 @@ def read_from(self, pos, size=-1):
return self._io.read() if size == -1 else self._io.read(size)

def write(self, s):
if not isinstance(s, bytes):
raise ValueError('Argument not in bytes: {0}'.format(s))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more useful to say what type the argument is, e.g., 'Expected bytes, not {}, for argument {}.'.format(type(s), s)

Side note - we typically don't enumerate the format args, unless we're repeating them (in which case we prefer to give them names instead of ordinals).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good on both. Can't wait for us to be able to use f-strings!

Thanks for the review

@stuhood stuhood merged commit 1a85b59 into pantsbuild:master Jul 9, 2018
@Eric-Arellano Eric-Arellano deleted the port-util branch July 10, 2018 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants