Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: export to geoarrow output #53

Merged
merged 19 commits into from
Nov 15, 2024

Conversation

jorisvandenbossche
Copy link
Collaborator

@jorisvandenbossche jorisvandenbossche commented Oct 9, 2024

Adding geoarrow export (after import was added in #49)

This depends on paleolimbot/s2geography#30 (merged now, but not yet released) and paleolimbot/s2geography#37 (for tesseltation/projection)

@jorisvandenbossche jorisvandenbossche marked this pull request as draft October 9, 2024 08:43
@jorisvandenbossche jorisvandenbossche changed the title Geoarrow output ENH: export to geoarrow output Oct 9, 2024
paleolimbot pushed a commit to paleolimbot/s2geography that referenced this pull request Oct 29, 2024
Similar to #28, just
opening to be able to add some comments / discuss the design.

For now it just can process points, without any options (but the points
do work, as being tested with
benbovy/spherely#53
Comment on lines +74 to +196
ArrowArrayHolder(const ArrowArrayHolder& rhs) = delete;

/// \brief Get a pointer to the data owned by this object
ArrowArray* array() noexcept {
return &array_;
}
const ArrowArray* array() const noexcept {
return &array_;
}

ArrowSchema* schema() noexcept {
return &schema_;
}
const ArrowSchema* schema() const noexcept {
return &schema_;
}

void set_schema(ArrowSchema* schema_src) {
memcpy(&schema_, schema_src, sizeof(struct ArrowSchema));
schema_src->release = nullptr;
}

py::tuple return_capsules(py::args args, const py::kwargs& kwargs) {
if ((args.size() > 0) && (!args[0].is_none())) {
throw std::invalid_argument(
"Additional arguments (such as requested_schema) with a non-default value are not "
"supported");
}
if (kwargs) {
for (auto& item : kwargs) {
if (!item.second.is_none()) {
throw std::invalid_argument(
"Additional arguments (such as requested_schema) with a non-default value "
"are not supported");
}
}
}

ArrowArray* c_array = static_cast<ArrowArray*>(malloc(sizeof(ArrowArray)));
ArrowSchema* c_schema = static_cast<ArrowSchema*>(malloc(sizeof(ArrowSchema)));
move(&array_, &schema_, c_array, c_schema);

constexpr auto array_cleanup = [](void* ptr) noexcept {
auto array = static_cast<ArrowArray*>(ptr);
if (array->release != nullptr) {
array->release(array);
}
free(array);
};
py::capsule array_capsule{c_array, "arrow_array", array_cleanup};

constexpr auto schema_cleanup = [](void* ptr) noexcept {
auto schema = static_cast<ArrowSchema*>(ptr);
if (schema->release != nullptr) {
schema->release(schema);
}
free(schema);
};
py::capsule schema_capsule{c_schema, "arrow_schema", schema_cleanup};

return py::make_tuple(schema_capsule, array_capsule);
}

void move(ArrowArray* array_src,
ArrowSchema* schema_src,
ArrowArray* array_dst,
ArrowSchema* schema_dst) {
memcpy(array_dst, array_src, sizeof(struct ArrowArray));
array_src->release = nullptr;

memcpy(schema_dst, schema_src, sizeof(struct ArrowSchema));
schema_src->release = nullptr;
}

/// \brief Call data's release callback if valid
void reset() {
if (array_.release != nullptr) {
array_.release(&array_);
}

if (schema_.release != nullptr) {
schema_.release(&schema_);
}
}

/// \brief Call data's release callback if valid and move ownership of the data
/// pointed to by data
void reset(ArrowArray* array_src, ArrowSchema* schema_src) {
reset();
move(array_src, schema_src, &array_, &schema_);
}

~ArrowArrayHolder() {
reset();
}

protected:
ArrowArray array_;
ArrowSchema schema_;
};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paleolimbot could you take a quick look at this ArrowArrayHolder class if that seems to make sense?
It's mostly inspired on the UniqueArray/Schema from nanoarrow.hpp (but then combined in a single class, making it easier to expose in python), but since I only need this (for now), it didn't feel worth vendoring the entire of nanoarrow for the 120 lines of code above.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any issues here!

It's also possible to use a std::unique_ptr<> if you don't feel like maintaining this:

https://github.com/rapidsai/cudf/blob/branch-24.12/cpp/include/cudf/interop.hpp#L121

https://github.com/rapidsai/cudf/blob/branch-24.12/cpp/src/interop/to_arrow_schema.cpp#L227-L230

...then what you have here basically collapses to std::pair<unique_schema_t, unique_array_t>.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Now, I still need to implement the "move" logic for the method to create and return the capsules (and the return_capsules method itself), and that together is already more than half of the code here. So not sure if using std::unique_ptr would end up that much simpler (and I also need a class to expose that as an object in python)

@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review November 6, 2024 09:52
Comment on lines +211 to +215
if (planar) {
auto tol = S1Angle::Radians(tessellate_tolerance / EARTH_RADIUS_METERS);
options.set_tessellate_tolerance(tol);
// TODO make this configurable
// options.set_projection(s2geog::geoarrow::mercator());
Copy link
Collaborator Author

@jorisvandenbossche jorisvandenbossche Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbovy we want to make this configurable, so the user can specify one of the pre-defined projections in s2geography (see paleolimbot/s2geography#39 for the options in s2geograhy), but any thoughts on the user API to do this?

We could have a very shallow Projection class that essentially just holds a S2::Projection, and then some creation functions for the different projections.

For example something with class methods like spherely.Projection.pseudo_mercator() ? Or just functions like spherely.pseudo_mercator_projection()? (or spherely.projection_pseudo_mercator())

(this can also be left for a follow-up PR)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and R s2 has s2_projection_plate_carree(), s2_projection_mercator(), s2_projection_orthographic(centre=(0,0)) , listed on https://r-spatial.github.io/s2/reference/wk_handle.s2_geography.html)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the shallow Projection class with class methods.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer keeping proper projection support for a follow-up PR, so for now this one should be ready I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a follow-up PR exposing projections using a shallow spherely.Projection class -> #69

@benbovy benbovy added this to the 0.1.0 milestone Nov 14, 2024
Copy link
Owner

@benbovy benbovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left a few minor comments.

src/geoarrow.cpp Outdated Show resolved Hide resolved
src/geoarrow.cpp Show resolved Hide resolved
src/geoarrow.cpp Show resolved Hide resolved
src/geoarrow.cpp Outdated Show resolved Hide resolved
src/geoarrow.cpp Outdated Show resolved Hide resolved
src/geoarrow.cpp Outdated Show resolved Hide resolved
tests/test_geoarrow.py Show resolved Hide resolved
Copy link
Owner

@benbovy benbovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Joris!

@jorisvandenbossche jorisvandenbossche merged commit 1217af5 into benbovy:main Nov 15, 2024
15 checks passed
@jorisvandenbossche jorisvandenbossche deleted the geoarrow-output branch November 15, 2024 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants