-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on shared column names when writing #1335
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1335 +/- ##
==========================================
- Coverage 85.65% 83.47% -2.19%
==========================================
Files 34 34
Lines 5460 5465 +5
==========================================
- Hits 4677 4562 -115
- Misses 783 903 +120
Flags with carried forward coverage won't be shown. Click here to find out more.
|
df.index, index=df.index | ||
).equals(df[df.index.name]): | ||
raise ValueError( | ||
f"DataFrame.index.name ({df.index.name!r}) is also used by a column " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"DataFrame.index.name ({df.index.name!r}) is also used by a column " | |
f"DataFrame.index.name ({repr(df.index.name)}) is also used by a column " |
https://peps.python.org/pep-0498/#s-r-and-a-are-redundant I had to google this tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that PEP is saying that we shouldn't use !r
it's just saying that there is another way to do it. Plus it's just a point in the discussion.
For this kind of thing, I would say if our formatter doesn't care then it's fine. If we want to enforce style rules, then we'll do it with the formatter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, !r is the way.
@@ -663,10 +663,23 @@ def write_dataframe(f, key, df, _writer, dataset_kwargs=MappingProxyType({})): | |||
if reserved in df.columns: | |||
raise ValueError(f"{reserved!r} is a reserved name for dataframe columns.") | |||
group = f.require_group(key) | |||
if not df.columns.is_unique: | |||
duplicates = list(df.columns[df.columns.duplicated()]) | |||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit, but this seems like a "breaking change" since the behavior is changing. I'm not sure how strict things are about that here, though. Should we warn until 0.11.0? Maybe use #1270 to ease things? But I see the point that this is a "bug" in some sense as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is throwing an error on malformed input which was allowed through before (leading to more problems). To me, that's much more fixing of a bug than changing behaviour.
Co-authored-by: Isaac Virshup <[email protected]>
AnnData
object with duplicated.obs
column names from.h5ad
#884