-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Albrja/mic 5720/mypy tests randomness index map #574
Conversation
@@ -215,6 +219,7 @@ def _convert_to_ten_digit_int( | |||
if pdt.is_datetime64_any_dtype(column): | |||
integers = self._clip_to_seconds(column.astype(np.int64)) | |||
elif pdt.is_integer_dtype(column): | |||
column = column.astype(int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is consistent with the other branch of the if/else block and makes mypy happy.
@@ -192,7 +192,11 @@ def _hash(self, keys: pd.Index[Any], salt: ClockTime = 0) -> pd.Series[int]: | |||
return new_map % len(self) | |||
|
|||
def _convert_to_ten_digit_int( | |||
self, column: pd.Series[datetime | int | float] | |||
self, | |||
column: pd.Series[pd.Timestamp] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was typed incorrectly before but basically to make myppy happy, we have to have pd.Timestamp as the type to include datetime because pandas always converts datetimes to Timestamps. However, Timestamp is technically a part of datetime. See: https://pandas.pydata.org/docs/user_guide/timeseries.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So even though pd.Timestamp inherits from datatime, you have to include it explicitly (despite also having datetime as part of the signature)?
Also, glad you caught that incorrect mashing of series types!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I think technically we could just make it pd.Timestamp and not have a series as datetime as an options since no matter what you do pandas will convert it but I t hink this is more intuitive since we do use datetime elsewhere and we have Time in types.py which is a Timestamp | datetime (and unfortunately I could not do a serious of that).
@@ -38,7 +49,7 @@ def generate_keys(number, types=("int", "float", "datetime"), seed=123456): | |||
keys["int"] = np.arange(start, start + number, dtype=int) | |||
|
|||
if "float" in types: | |||
keys["float"] = rs.random_sample(size=number) | |||
keys["float"] = rs.random_sample(number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what drove removing the keyword?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why I had deleted this but it is back in there.
seed = 123456 | ||
salt = IndexMap()._convert_to_ten_digit_int(pd.Series(salt, index=k)) | ||
rs = np.random.RandomState(seed=seed + salt) | ||
digit_series = IndexMap()._convert_to_ten_digit_int(pd.Series(salt, index=k)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe salt_series
?
keys = generate_keys(1000) | ||
m = index_map(key_columns=list(keys.columns)) | ||
m.update(keys, pd.to_datetime("2023-01-01")) | ||
key_index = keys.set_index(list(keys.columns)).index | ||
assert len(m._map) == len(keys), "All keys not in mapping" | ||
map = m._map | ||
assert isinstance(map, pd.Series) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you need this assertion if you typed map specifically, map: pd.Series = m._map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I specifically had to do this because _map can either be pd.Series[int] or None but in this case it has to be a series. For some reason mypy doesnt like inline typing vs the assertion.
Albrja/mic 5720/mypy tests randomness index map
Fix mypy errors in test_index_map.py
Changes and notes
-fix mypy errors
Testing