Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Albrja/mic 5720/mypy tests randomness index map #574

Merged
merged 8 commits into from
Jan 25, 2025

Conversation

albrja
Copy link
Contributor

@albrja albrja commented Jan 23, 2025

Albrja/mic 5720/mypy tests randomness index map

Fix mypy errors in test_index_map.py

Changes and notes

-fix mypy errors

Testing

@@ -215,6 +219,7 @@ def _convert_to_ten_digit_int(
if pdt.is_datetime64_any_dtype(column):
integers = self._clip_to_seconds(column.astype(np.int64))
elif pdt.is_integer_dtype(column):
column = column.astype(int)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is consistent with the other branch of the if/else block and makes mypy happy.

@@ -192,7 +192,11 @@ def _hash(self, keys: pd.Index[Any], salt: ClockTime = 0) -> pd.Series[int]:
return new_map % len(self)

def _convert_to_ten_digit_int(
self, column: pd.Series[datetime | int | float]
self,
column: pd.Series[pd.Timestamp]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was typed incorrectly before but basically to make myppy happy, we have to have pd.Timestamp as the type to include datetime because pandas always converts datetimes to Timestamps. However, Timestamp is technically a part of datetime. See: https://pandas.pydata.org/docs/user_guide/timeseries.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So even though pd.Timestamp inherits from datatime, you have to include it explicitly (despite also having datetime as part of the signature)?

Also, glad you caught that incorrect mashing of series types!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I think technically we could just make it pd.Timestamp and not have a series as datetime as an options since no matter what you do pandas will convert it but I t hink this is more intuitive since we do use datetime elsewhere and we have Time in types.py which is a Timestamp | datetime (and unfortunately I could not do a serious of that).

@@ -38,7 +49,7 @@ def generate_keys(number, types=("int", "float", "datetime"), seed=123456):
keys["int"] = np.arange(start, start + number, dtype=int)

if "float" in types:
keys["float"] = rs.random_sample(size=number)
keys["float"] = rs.random_sample(number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what drove removing the keyword?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why I had deleted this but it is back in there.

seed = 123456
salt = IndexMap()._convert_to_ten_digit_int(pd.Series(salt, index=k))
rs = np.random.RandomState(seed=seed + salt)
digit_series = IndexMap()._convert_to_ten_digit_int(pd.Series(salt, index=k))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe salt_series?

keys = generate_keys(1000)
m = index_map(key_columns=list(keys.columns))
m.update(keys, pd.to_datetime("2023-01-01"))
key_index = keys.set_index(list(keys.columns)).index
assert len(m._map) == len(keys), "All keys not in mapping"
map = m._map
assert isinstance(map, pd.Series)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you need this assertion if you typed map specifically, map: pd.Series = m._map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically had to do this because _map can either be pd.Series[int] or None but in this case it has to be a series. For some reason mypy doesnt like inline typing vs the assertion.

@albrja albrja merged commit 9fa250d into main Jan 25, 2025
8 checks passed
@albrja albrja deleted the albrja/mic-5720/mypy-tests-randomness-index-map branch January 25, 2025 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants