Xarray dumpers / loaders #16

melonora · 2024-09-19T16:23:11Z

Adding xarray dumpers and loaders

…dependent

rly · 2024-09-19T17:51:25Z

src/linkml_arrays/dumpers/xarray_dumpers.py

+                # If values are length 1 we are dealing with coords like date
+                v = v.values
+                try:
+                    node_dict["coords"][k] = v


Suggested change

node_dict["coords"][k] = v

node_dict["coords"][k] = {"data": v, "dims": k}

This would be consistent with the lines below. Not sure if dims is necessary, but I think so?

rly · 2024-09-19T17:53:07Z

src/linkml_arrays/dumpers/xarray_dumpers.py

+
+
+def _create_node(model, schemaview):
+    """Create datatree from temperature dataset"""


Add comment with pointer to the data model of a Dataset's attrs/coords/data_vars/dims.

rly · 2024-09-19T17:55:08Z

src/linkml_arrays/dumpers/xarray_dumpers.py

+                            node_dict["coords"][k].update({"attrs": {key: value}})
+                        else:
+                            # conversion factor
+                            node_dict["data_vars"] = {k: {"attrs": {key: value}}}


What if there is more than one? Here the attrs dict is set to a dictionary with just this kv pair. I think this should call _create_node recursively.

rly · 2024-09-19T17:57:49Z

src/linkml_arrays/dumpers/xarray_dumpers.py

+        if isinstance(v, BaseModel):
+            # create a subgroup and recurse
+            if "values" in v.__dir__():
+                dims = ["y","x"]


TODO: see if we can get this from the schemaview.

sneakers-the-rat · 2024-09-19T21:45:41Z

src/linkml_arrays/dumpers/yaml_xarray_dumpers.py

+        return output_file_path
+
+
+class YamlXarrayZarrDumper(YamlArrayFileDumper):


I am sorta having a tough time telling whats going on in this PR (sorry for missing the meeting, I never know when they are), but im doing this rn and u will definitely want to have a model for this, otherwise all the ser/des logic gets packed into a huge nesty dumper/loader and u lose the declarative relationship between in memory and dumped.

sneakers-the-rat · 2024-09-19T21:55:10Z

As just a general comment, when working with linkml models which can contain a ton of references between objects, and we cant/shouldnt assume hierarchical structure, I have found that recursive patterns become sort of a liability. Its one reason I restructured nwb-linkml stuff to be a sort of bidirectional passing of results objects up and down system for model generation, bc the decision about what to do at any given node is dependent on at least the parent/child ( I see some switches in here for deciding whether or not we're a dataset or an attr, for example, and also some comments about "what if there is more than one of this").

For loading/dumping, I think u will probably want to structure it like a graph that models dependencies between the nodes. My still-pretty-crappy-and-repetitive version of this for nwb is over here: https://github.com/p2p-ld/nwb-linkml/blob/main/nwb_linkml/src/nwb_linkml/io/hdf5.py

sneakers-the-rat · 2024-09-21T11:44:33Z

also maybe relevant in general im about 2 do some stuff like dis
https://numpydantic.readthedocs.io/en/dump_json/serialization.html

p2p-ld/numpydantic#20

class MyModel(BaseModel):
    array: NDArray

model = MyModel(array="data/test.avi")
print_json(model.model_dump_json(round_trip=True))

{"array": {"type": "video", "file": "data/test.avi"}}

model.model_dump_json(round_trip=True)

{
  "array": {
    "type": "dask",
    "name": "array-2a39187fc9fcee3f4cdbc1f2911b4b92",
    "chunks": [[2], [2]],
    "dtype": "int64",
    "array": [[1, 2], [3, 4]]
  }
}

  model.model_dump_json(
    round_trip=True, 
    context={"mark_interface": True}
    ))

{
  "array": {
    "interface": {
      "module": "numpydantic.interface.hdf5",
      "cls": "H5Interface",
      "version": "1.5.3"
    },
    "value": {
      "type": "hdf5",
      "file": "data/test.h5",
      "path": "/data",
      "field": null
    }
  }
}

and so on.

and da structure of that is all controllable by the interface class so ya can make your own serialization no sweat

eg: https://github.com/p2p-ld/numpydantic/blob/135b74aa2e4d6e5a755c27d3bb7fe170650058d3/src/numpydantic/interface/hdf5.py#L80-L91
and: https://github.com/p2p-ld/numpydantic/blob/135b74aa2e4d6e5a755c27d3bb7fe170650058d3/src/numpydantic/interface/hdf5.py#L386-L412

melonora · 2024-11-20T10:13:30Z

Sorry for the late comment but you are absolutely right @sneakers-the-rat, when implementing it this way I indeed ran into the problem that the recursiveness makes it difficult to to traverse up and down the graph if required, where a single level within the schema view might not give the required information on its own how to exactly deal with the incoming information.

melonora · 2024-11-20T10:14:20Z

@rly I think this would be good to discuss this evening

rly and others added 30 commits July 3, 2024 12:21

Work with new array schema and LoL generator

6ad73fc

Put labeled by as string annotation on array

92ed77b

Use pydantic 2 in array classes

bbfb77a

Temp workaround for multivalued annotation value

f1754fb

Use pydantic 2 in classes

3bf80a1

Update tests, dumpers, loaders

fc77a6d

Update tests, style

3432b90

Update gitignore

4009efe

Clean up

35e8750

Clean up

0af7a87

Clean up

1f6a7f5

Clean up

188e5aa

Clean up

3a4d785

Update array file format to be dict of sources

077ca53

Update deps, remove cookiecutter cli

175c882

Remove demo test

ca40bff

adjust for non ordered keys

6aa8878

add xarray loaders

1bbd018

update dependency

0d68f95

update module imports

0858c17

add tests

d436994

add xarray zarr dumper test

a23202d

add xarray yaml netcdf dumper

64f411b

add xarray input

0fcadbb

delete outdated .nc

e3d93bb

delete outdated .zarr

3f86415

added name to data arrays

d328605

remove outdated data

40e2ded

move refernce date to coords.attrs

175e014

move conversion factor to array attributes

f5ce0f9

melonora and others added 8 commits August 6, 2024 12:37

add updated data

0d213c3

add yaml xarray zarr dumper

4e9f44f

don't use datetime

06f527e

added xarray zarr loader

b26372e

added xarray yaml loaders

ffb9645

Merge branch 'main' into xarray

ad32157

Update dumpers to write relative to yaml, make loader/dumper tests in…

c0bdd6c

…dependent

Implement dump and raise error for dumps

741e212

rly reviewed Sep 19, 2024

View reviewed changes

sneakers-the-rat reviewed Sep 19, 2024

View reviewed changes

adjust for deprecated data keyword argument

77a0feb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xarray dumpers / loaders #16

Xarray dumpers / loaders #16

melonora commented Sep 19, 2024

rly Sep 19, 2024

rly Sep 19, 2024

rly Sep 19, 2024

rly Sep 19, 2024

sneakers-the-rat Sep 19, 2024

sneakers-the-rat commented Sep 19, 2024

sneakers-the-rat commented Sep 21, 2024 •

edited

Loading

melonora commented Nov 20, 2024

melonora commented Nov 20, 2024

	node_dict["coords"][k] = v
	node_dict["coords"][k] = {"data": v, "dims": k}



		def _create_node(model, schemaview):
		"""Create datatree from temperature dataset"""

		return output_file_path


		class YamlXarrayZarrDumper(YamlArrayFileDumper):

Xarray dumpers / loaders #16

Are you sure you want to change the base?

Xarray dumpers / loaders #16

Conversation

melonora commented Sep 19, 2024

rly Sep 19, 2024

Choose a reason for hiding this comment

rly Sep 19, 2024

Choose a reason for hiding this comment

rly Sep 19, 2024

Choose a reason for hiding this comment

rly Sep 19, 2024

Choose a reason for hiding this comment

sneakers-the-rat Sep 19, 2024

Choose a reason for hiding this comment

sneakers-the-rat commented Sep 19, 2024

sneakers-the-rat commented Sep 21, 2024 • edited Loading

melonora commented Nov 20, 2024

melonora commented Nov 20, 2024

sneakers-the-rat commented Sep 21, 2024 •

edited

Loading