-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port majority of pants/util #6073
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -14,10 +14,11 @@ | |||
import time | |||
import uuid | |||
import zipfile | |||
from builtins import object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporarily removing the override of str()
because it breaks a bunch of tests. I'll figure out how to fix this in a separate dedicated PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually for now I'm going keep the original semantics. I don't understand the ramifications to changing this.
We might run into issues when running Python3 tests the first time - if so, we'll know we have to change the semantics from what we have now.
Below change (line 159) makes it so the semantics is the same in Py2 and Py3.
src/python/pants/util/rwbuf.py
Outdated
@@ -6,6 +6,7 @@ | |||
unicode_literals, with_statement) | |||
|
|||
import threading | |||
from builtins import object, str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjyw can you look over this file please?
This override means that in line 39, during call to _RWBuf.write()
, it will call _RWBuf.do_write()
with unicode
, rather than bytes
, which might be a problem. However, do_write()
simply raises NotImplementedError
, so I don't know if this actually matters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, from builtins import str
is a no-op in py3 but renames py2's unicode type to str
in py2? I think this definitely will have consequences. A low-level IO method should always operate on raw bytes (str
in py2, bytes
in py3), never on encoded strings.
However this file only explicitly uses str
in one line, and that's hacky:
self.do_write(str(s))
If s
is already binary data, no problem, but if it's text, then in python2 that str(s)
would attempt to encode it using the default encoding, which is not guaranteed to work.
So really write() should check if the argument is bytes, and throw an exception of not. That way we'll find out eagerly if any call sites do the wrong thing.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine something like from builtins import bytes
and then in write()
check for isinstance(s, bytes)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. from builtins import str
modifies Python2's str()
function so that it gets a unicode, rather than bytes, representation.
Agreed with that change. I'll break it out into a separate PR since it's a substantial semantic change + we have no explicit tests for this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually what I'm proposing involves no semantic change. This commit, as stands, would be a major semantic change, as it would cast arbitrary bytes into text, which might explode. So you can't merge this as-is.
@@ -34,8 +35,10 @@ def read_from(self, pos, size=-1): | |||
return self._io.read() if size == -1 else self._io.read(size) | |||
|
|||
def write(self, s): | |||
if not isinstance(s, bytes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjyw follow up from prior discussion. Good point about prior version changing semantics.
This does change semantics that now ValueError
will be raised whereas earlier we would try to coerce into bytes. If it's safe to make this change, I agree with you this is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but that coercion might have failed, so it's best to at least fail predictably and eagerly.
Followup from #6073 port of `util`. These required manual changes. `util` is all ported after this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rwbuf.py LGTM mod my comment about making the error message more informative.
src/python/pants/util/rwbuf.py
Outdated
@@ -34,8 +35,10 @@ def read_from(self, pos, size=-1): | |||
return self._io.read() if size == -1 else self._io.read(size) | |||
|
|||
def write(self, s): | |||
if not isinstance(s, bytes): | |||
raise ValueError('Argument not in bytes: {0}'.format(s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be more useful to say what type the argument is, e.g., 'Expected bytes, not {}, for argument {}.'.format(type(s), s)
Side note - we typically don't enumerate the format args, unless we're repeating them (in which case we prefer to give them names instead of ordinals).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good on both. Can't wait for us to be able to use f-strings!
Thanks for the review
Ports everything
util/
except forobjects.py
andmeta.py
(see #6072),process_handler.py
, andtarutil.py
.Part of #6062