Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessor Functionality for AnnData #1870

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

srivarra
Copy link

@srivarra srivarra commented Feb 21, 2025

Added Accessor / Extension functionality based off the work in Polars. Added tests as well.

Added _accessors to anndata._core.anndata.AnnData to house registered accessor names to avoid name conflicts.

You can create an accessor by decorating a class with methods with the register_anndata_namespace function.

Here's an example of how to use it:

import anndata as ad

@ad.register_anndata_namespace("my_accessor")
class Greetings:
	def __init__(self, adata: ad.AnnData):
		self._adata = adata
	
	def greet():
		return "hi"

# and to use

adata.my_accessor.greet()

Not sure if this needs a dedicated tutorial notebook, or specific documentation page beyond the docstring.

Copy link

codecov bot commented Feb 21, 2025

Codecov Report

Attention: Patch coverage is 95.29412% with 4 lines in your changes missing coverage. Please review.

Project coverage is 83.98%. Comparing base (3d56a38) to head (db71c52).

Files with missing lines Patch % Lines
src/anndata/_core/extensions.py 96.15% 3 Missing ⚠️
src/anndata/_types.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1870      +/-   ##
==========================================
- Coverage   86.11%   83.98%   -2.13%     
==========================================
  Files          40       41       +1     
  Lines        6242     6326      +84     
==========================================
- Hits         5375     5313      -62     
- Misses        867     1013     +146     
Files with missing lines Coverage Δ
src/anndata/__init__.py 100.00% <100.00%> (ø)
src/anndata/_core/anndata.py 82.63% <100.00%> (+0.02%) ⬆️
src/anndata/_types.py 84.21% <80.00%> (-1.09%) ⬇️
src/anndata/_core/extensions.py 96.15% <96.15%> (ø)

... and 9 files with indirect coverage changes

@ilan-gold ilan-gold added this to the 0.12.0 milestone Feb 24, 2025
Copy link
Contributor

@ilan-gold ilan-gold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Some preliminary comments/changes

Taken from:
https://github.com/pandas-dev/pandas/blob/ab89c53f48df67709a533b6a95ce3d911871a0a8/pandas/util/_exceptions.py#L30-L51
"""
import anndata as ad
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why an internal-to-the-function import?

Copy link
Author

@srivarra srivarra Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason in particular, it was just what Polars used. As long as the function gets the location of the anndata/__init__.py file it'll work. Moving it outside the function should be fine since there are not any circular imports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why resolved without change? There was a circular import?

msg = f"cannot override reserved namespace {name!r}"
raise AttributeError(msg)

elif hasattr(cls, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think we want to disallow this behavior in some form. I wouldn't want someone overriding X. Maybe I'm misunderstanding, but I also don't see a test for this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, just tested it out and it absolutely overrides X. What attributes should be protected, should I just do all of them with dir(AnnData)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there's a reason to allow overriding already existing attributes, why not just throw an attribute error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this was resolved but it's still just a warning?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using Jupyter notebooks, raising an AttributeError for overriding custom namespaces can be disruptive. When you modify and re-register a namespace, the user has to frequently restart their kernel to reset the accessor namespaces, because the namespace already exists. While protecting against conflicts with core AnnData attributes (like X, obs_names, etc...) is good, being able to overwrite existing custom namespaces would be nice for iterative development in notebooks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But as things stand, won't this function simply warn if you try to override X i.e., elif hasattr(cls, name)?

@srivarra
Copy link
Author

@ilan-gold

Made the requested changes:

  • Added ExtensionNamespace Protocol:
    • Provides a standardized interface for static type checking with mypy (can't really enforce implementation details when a user builds an accessor with a Protocol)
    • Defines the expected method signatures for custom accessors
  • Implemented Runtime Validation via _check_namespace_signature:
    • Verifies at registration time that custom accessors have the correct method signatures
    • Ensures __init__ properly accepts self and an adata parameter of type anndata.AnnData
    • Prevents namespace collisions with existing AnnData attributes (e.g., X, obs, var_names)
    • Raises error messages when validation fails

Added tests for (probably) all validation scenarios (missing parameters, wrong parameter names, incorrect type annotations) and tests for namespace collision detection.

A couple of issues I'm having:

I'm not too sure what's going on with the readthedocs build, since I'm unable to replicate it locally, instead I have a different error:

Here's my error log for `hatch run docs:build`
Running Sphinx v8.2.1
loading translations [en]... done
matplotlib is not installed, social cards will not be generated
myst v4.0.1: MdParserConfig(commonmark_only=False, gfm_only=False, enable_extensions={'html_image'}, disable_syntax=[], all_links_external=False, links_external_new_tab=False, url_schemes=('http', 'https', 'mailto', 'ftp'), ref_domains=None, fence_as_directive=set(), number_code_blocks=[], title_to_header=False, heading_anchors=3, heading_slug_func=None, html_meta={}, footnote_sort=True, footnote_transition=True, words_per_minute=200, substitutions={}, linkify_fuzzy_links=True, dmath_allow_labels=True, dmath_allow_space=True, dmath_allow_digits=True, dmath_double_inline=False, update_mathjax=True, mathjax_classes='tex2jax_process|mathjax_process|math|output_area', enable_checkboxes=False, suppress_warnings=[], highlight_code_blocks=True)
loading intersphinx inventory 'awkward' from https://awkward-array.org/doc/stable/objects.inv ...
loading intersphinx inventory 'cupy' from https://docs.cupy.dev/en/stable/objects.inv ...
loading intersphinx inventory 'dask' from https://docs.dask.org/en/stable/objects.inv ...
loading intersphinx inventory 'h5py' from https://docs.h5py.org/en/latest/objects.inv ...
loading intersphinx inventory 'hdf5plugin' from https://hdf5plugin.readthedocs.io/en/latest/objects.inv ...
loading intersphinx inventory 'loompy' from https://linnarssonlab.org/loompy/objects.inv ...
loading intersphinx inventory 'numpy' from https://numpy.org/doc/stable/objects.inv ...
loading intersphinx inventory 'pandas' from https://pandas.pydata.org/pandas-docs/stable/objects.inv ...
loading intersphinx inventory 'python' from https://docs.python.org/3/objects.inv ...
loading intersphinx inventory 'scipy' from https://docs.scipy.org/doc/scipy/objects.inv ...
loading intersphinx inventory 'sklearn' from https://scikit-learn.org/stable/objects.inv ...
loading intersphinx inventory 'xarray' from https://docs.xarray.dev/en/stable/objects.inv ...
loading intersphinx inventory 'zarr' from https://zarr.readthedocs.io/en/v2.18.4/objects.inv ...
[autosummary] generating autosummary for: _key_contributors.rst, api.md, benchmark-read-write.ipynb, benchmarks.md, concatenation.rst, contributing.md, fileformat-prose.md, generated/anndata.AnnData.T.rst, generated/anndata.AnnData.X.rst, generated/anndata.AnnData.chunk_X.rst, ..., generated/anndata.settings.override.rst, generated/anndata.settings.rst, generated/anndata.typing.AxisStorable.rst, generated/anndata.typing.Index.rst, generated/anndata.typing.RWAble.rst, index.md, interoperability.md, references.rst, release-notes/index.md, tutorials/index.md
Writing evaluated template result to /Users/srivarra/Dev/Projects/Open Source Contributions/scverse/anndata/docs/_build/html/_static/nbsphinx-code-cells.css
building [mo]: targets for 0 po files that are out of date
writing output... 
building [html]: targets for 114 source files that are out of date
updating environment: [new config] 114 added, 0 changed, 0 removed
reading sources... [  1%] _key_contributors
reading sources... [  2%] api
reading sources... [  3%] benchmark-read-write
reading sources... [  4%] benchmarks
reading sources... [  4%] concatenation
reading sources... [  5%] contributing
reading sources... [  6%] fileformat-prose
reading sources... [  7%] generated/anndata.AnnData
reading sources... [  8%] generated/anndata.AnnData.T
reading sources... [  9%] generated/anndata.AnnData.X
reading sources... [ 10%] generated/anndata.AnnData.chunk_X
reading sources... [ 11%] generated/anndata.AnnData.chunked_X
reading sources... [ 11%] generated/anndata.AnnData.concatenate
reading sources... [ 12%] generated/anndata.AnnData.copy
reading sources... [ 13%] generated/anndata.AnnData.filename
reading sources... [ 14%] generated/anndata.AnnData.is_view
reading sources... [ 15%] generated/anndata.AnnData.isbacked
reading sources... [ 16%] generated/anndata.AnnData.layers
reading sources... [ 17%] generated/anndata.AnnData.n_obs
reading sources... [ 18%] generated/anndata.AnnData.n_vars
reading sources... [ 18%] generated/anndata.AnnData.obs
reading sources... [ 19%] generated/anndata.AnnData.obs_keys
reading sources... [ 20%] generated/anndata.AnnData.obs_names
reading sources... [ 21%] generated/anndata.AnnData.obs_names_make_unique
reading sources... [ 22%] generated/anndata.AnnData.obs_vector
reading sources... [ 23%] generated/anndata.AnnData.obsm
reading sources... [ 24%] generated/anndata.AnnData.obsm_keys
reading sources... [ 25%] generated/anndata.AnnData.obsp
reading sources... [ 25%] generated/anndata.AnnData.raw
reading sources... [ 26%] generated/anndata.AnnData.rename_categories
reading sources... [ 27%] generated/anndata.AnnData.shape
reading sources... [ 28%] generated/anndata.AnnData.strings_to_categoricals
reading sources... [ 29%] generated/anndata.AnnData.to_df
reading sources... [ 30%] generated/anndata.AnnData.to_memory
reading sources... [ 31%] generated/anndata.AnnData.transpose
reading sources... [ 32%] generated/anndata.AnnData.uns
reading sources... [ 32%] generated/anndata.AnnData.uns_keys
reading sources... [ 33%] generated/anndata.AnnData.var
reading sources... [ 34%] generated/anndata.AnnData.var_keys
reading sources... [ 35%] generated/anndata.AnnData.var_names
reading sources... [ 36%] generated/anndata.AnnData.var_names_make_unique
reading sources... [ 37%] generated/anndata.AnnData.var_vector
reading sources... [ 38%] generated/anndata.AnnData.varm
reading sources... [ 39%] generated/anndata.AnnData.varm_keys
reading sources... [ 39%] generated/anndata.AnnData.varp
reading sources... [ 40%] generated/anndata.AnnData.write
reading sources... [ 41%] generated/anndata.AnnData.write_csvs
reading sources... [ 42%] generated/anndata.AnnData.write_h5ad
reading sources... [ 43%] generated/anndata.AnnData.write_loom
reading sources... [ 44%] generated/anndata.AnnData.write_zarr
reading sources... [ 45%] generated/anndata.ImplicitModificationWarning
reading sources... [ 46%] generated/anndata.abc.CSCDataset
reading sources... [ 46%] generated/anndata.abc.CSCDataset.__getitem__
reading sources... [ 47%] generated/anndata.abc.CSCDataset.backend


Versions
========

* Platform:         darwin; (macOS-15.3.1-arm64-arm-64bit)
* Python version:   3.12.5 (CPython)
* Sphinx version:   8.2.1
* Docutils version: 0.21.2
* Jinja2 version:   3.1.5
* Pygments version: 2.19.1

Last Messages
=============

    generated/anndata.abc.CSCDataset


    reading sources... [ 46%]
    generated/anndata.abc.CSCDataset.__getitem__


    reading sources... [ 47%]
    generated/anndata.abc.CSCDataset.backend

Loaded Extensions
=================

* sphinx.ext.mathjax (8.2.1)
* alabaster (1.0.0)
* sphinxcontrib.applehelp (2.0.0)
* sphinxcontrib.devhelp (2.0.0)
* sphinxcontrib.htmlhelp (2.1.0)
* sphinxcontrib.serializinghtml (2.0.0)
* sphinxcontrib.qthelp (2.0.0)
* myst_parser (4.0.1)
* sphinx_copybutton (0.5.2)
* sphinx.ext.autodoc.preserve_defaults (8.2.1)
* sphinx.ext.autodoc.type_comment (8.2.1)
* sphinx.ext.autodoc.typehints (8.2.1)
* sphinx.ext.autodoc (8.2.1)
* sphinx.ext.intersphinx (8.2.1)
* sphinx.ext.doctest (8.2.1)
* sphinx.ext.coverage (8.2.1)
* sphinx.ext.napoleon (8.2.1)
* sphinx.ext.autosummary (8.2.1)
* sphinx_autodoc_typehints (unknown version)
* sphinx_issues (5.0.0)
* sphinx_design (0.6.1)
* sphinx_search.extension (0.3.2)
* sphinxext.opengraph (unknown version)
* scanpydoc.definition_list_typed_field (0.15.2)
* scanpydoc.elegant_typehints (0.15.2)
* scanpydoc.rtd_github_links (0.15.2)
* scanpydoc.theme (unknown version)
* scanpydoc.release_notes (0.15.2)
* scanpydoc (0.15.2)
* sphinx.ext.linkcode (8.2.1)
* nbsphinx (0.9.6)
* IPython.sphinxext.ipython_console_highlighting (unknown version)
* sphinx_toolbox.more_autodoc.autoprotocol (3.8.3)
* no_skip_abc_members (unknown version)
* patch_myst_cite (unknown version)
* sphinx_book_theme (unknown version)
* pydata_sphinx_theme (unknown version)

Traceback
=========

      File "/Users/srivarra/Library/Application Support/hatch/env/virtual/anndata/_5ujvw4a/docs/lib/python3.12/site-packages/sphinx/ext/autodoc/__init__.py", line 2980, in add_directive_header
        objrepr = stringify_annotation(
                  ^^^^^^^^^^^^^^^^^^^^^
    TypeError: _stringify_annotation() got an unexpected keyword argument 'short_literals'


The full traceback has been saved in:
/var/folders/fy/q2szypn9325d_0g0nq049k300000gq/T/sphinx-err-k20ijzbv.log

To report this error to the developers, please open an issue at <https://github.com/sphinx-doc/sphinx/issues/>. Thanks!
Please also report this if it was a user error, so that a better error message can be provided next time.

I made sure to merge main into this branch, but idk.

In addition I can't quite comprehend this single test which didn't pass MEMORY PROBLEMS tests/test_dask_view_mem.py::test_size_of_view[layers-True] . It did pass locally on my mac however.

Thank you!

@ilan-gold
Copy link
Contributor

ilan-gold commented Feb 26, 2025

I'm not too sure what's going on with the readthedocs build, since I'm unable to replicate it locally, instead I have a different error:

On CI, the issue is that you're exporting a new function whose typing depends on anndata._core.extensions.ExtensionNamespace so you'll need to export ExtensionNamespace as well in the docs.

If you're running the docs using hatch you can prune your environment - if not, would highly recommend using hatch! Check out our hatch.toml

In addition I can't quite comprehend this single test which didn't pass MEMORY PROBLEMS tests/test_dask_view_mem.py::test_size_of_view[layers-True] . It did pass locally on my mac however.

That one is just a hair flaky, we see this every so often, but generally passes (as it does now)

@srivarra srivarra requested a review from ilan-gold February 26, 2025 22:13
@@ -68,11 +69,13 @@ def __getattr__(attr_name: str) -> Any:
# Classes
"AnnData",
"Raw",
"ExtensionNamespace",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a typing.py file so you don't have to reexport

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this resolved?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to remove it from there. Should the protocol go in src/anndata/_types.py or src/anndata/typing.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flying-sheep can answer this best, but I think _types.py actually

raise TypeError(msg)


def _create_namespace(name: str, cls: type[AnnData]) -> Callable[[type], type]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more type usage that shoudl be ExtensionNamespace (there are others as well)



@runtime_checkable
class ExtensionNamespace(Protocol[NS]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the NS generic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thought it was necessary with the protocol, but looks like it doesn't really do anything.

try:
type_hints = get_type_hints(ns_class.__init__)
resolved_type = type_hints.get(param.name, param.annotation)
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the blanket except here with something more precise

msg = f"cannot override reserved namespace {name!r}"
raise AttributeError(msg)

elif hasattr(cls, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this was resolved but it's still just a warning?

@ilan-gold ilan-gold requested a review from flying-sheep March 3, 2025 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accessors for AnnData
2 participants