Error on shared column names when writing #1335

ivirshup · 2024-01-25T15:52:54Z

Closes Loss of data when loading AnnData object with duplicated .obs column names from .h5ad #884
Tests added
Release note added (or unnecessary)

codecov · 2024-01-25T16:04:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (299ca97) 85.65% compared to head (470f255) 83.47%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1335      +/-   ##
==========================================
- Coverage   85.65%   83.47%   -2.19%     
==========================================
  Files          34       34              
  Lines        5460     5465       +5     
==========================================
- Hits         4677     4562     -115     
- Misses        783      903     +120

Flag	Coverage Δ
gpu-tests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
anndata/_io/specs/methods.py	`88.06% <100.00%> (-0.10%)`	⬇️

... and 7 files with indirect coverage changes

ilan-gold · 2024-01-29T08:56:40Z

anndata/_io/specs/methods.py

+            df.index, index=df.index
+        ).equals(df[df.index.name]):
+            raise ValueError(
+                f"DataFrame.index.name ({df.index.name!r}) is also used by a column "


Suggested change

f"DataFrame.index.name ({df.index.name!r}) is also used by a column "

f"DataFrame.index.name ({repr(df.index.name)}) is also used by a column "

https://peps.python.org/pep-0498/#s-r-and-a-are-redundant I had to google this tbh.

I don't think that PEP is saying that we shouldn't use !r it's just saying that there is another way to do it. Plus it's just a point in the discussion.

For this kind of thing, I would say if our formatter doesn't care then it's fine. If we want to enforce style rules, then we'll do it with the formatter.

Yeah, !r is the way.

ilan-gold · 2024-01-29T08:59:24Z

anndata/_io/specs/methods.py

@@ -663,10 +663,23 @@ def write_dataframe(f, key, df, _writer, dataset_kwargs=MappingProxyType({})):
        if reserved in df.columns:
            raise ValueError(f"{reserved!r} is a reserved name for dataframe columns.")
    group = f.require_group(key)
+    if not df.columns.is_unique:
+        duplicates = list(df.columns[df.columns.duplicated()])
+        raise ValueError(


A nit, but this seems like a "breaking change" since the behavior is changing. I'm not sure how strict things are about that here, though. Should we warn until 0.11.0? Maybe use #1270 to ease things? But I see the point that this is a "bug" in some sense as well.

I think this PR is throwing an error on malformed input which was allowed through before (leading to more problems). To me, that's much more fixing of a bug than changing behaviour.

Co-authored-by: Isaac Virshup <[email protected]>

Error on shared column names when writing

6eb5486

ivirshup added topic: io type: dataframe 🧮 labels Jan 25, 2024

add test

b2deb2a

ivirshup added this to the 0.10.6 milestone Jan 25, 2024

ivirshup requested a review from flying-sheep January 26, 2024 13:19

ivirshup added the skip-gpu-ci label Jan 26, 2024

ivirshup added 2 commits January 26, 2024 15:03

Merge branch 'main' into repeated-cols

bebace0

Release note

470f255

ivirshup requested a review from ilan-gold January 26, 2024 15:06

ilan-gold approved these changes Jan 29, 2024

View reviewed changes

flying-sheep approved these changes Jan 29, 2024

View reviewed changes

ivirshup merged commit d07306f into scverse:main Jan 29, 2024
14 checks passed

meeseeksmachine pushed a commit to meeseeksmachine/anndata that referenced this pull request Jan 29, 2024

Backport PR scverse#1335: Error on shared column names when writing

6c28792

meeseeksmachine mentioned this pull request Jan 29, 2024

Backport PR #1335 on branch 0.10.x (Error on shared column names when writing) #1351

Merged

ivirshup added a commit that referenced this pull request Jan 29, 2024

Backport PR #1335: Error on shared column names when writing (#1351)

69a2a2e

Co-authored-by: Isaac Virshup <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on shared column names when writing #1335

Error on shared column names when writing #1335

ivirshup commented Jan 25, 2024 •

edited

Loading

codecov bot commented Jan 25, 2024 •

edited

Loading

ilan-gold Jan 29, 2024

ivirshup Jan 29, 2024

flying-sheep Jan 29, 2024

ilan-gold Jan 29, 2024

ivirshup Jan 29, 2024

	f"DataFrame.index.name ({df.index.name!r}) is also used by a column "
	f"DataFrame.index.name ({repr(df.index.name)}) is also used by a column "

Error on shared column names when writing #1335

Error on shared column names when writing #1335

Conversation

ivirshup commented Jan 25, 2024 • edited Loading

codecov bot commented Jan 25, 2024 • edited Loading

Codecov Report

ilan-gold Jan 29, 2024

Choose a reason for hiding this comment

ivirshup Jan 29, 2024

Choose a reason for hiding this comment

flying-sheep Jan 29, 2024

Choose a reason for hiding this comment

ilan-gold Jan 29, 2024

Choose a reason for hiding this comment

ivirshup Jan 29, 2024

Choose a reason for hiding this comment

ivirshup commented Jan 25, 2024 •

edited

Loading

codecov bot commented Jan 25, 2024 •

edited

Loading