-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: export to geoarrow output #53
ENH: export to geoarrow output #53
Conversation
Similar to #28, just opening to be able to add some comments / discuss the design. For now it just can process points, without any options (but the points do work, as being tested with benbovy/spherely#53
cfa4e4b
to
b9aba2d
Compare
ArrowArrayHolder(const ArrowArrayHolder& rhs) = delete; | ||
|
||
/// \brief Get a pointer to the data owned by this object | ||
ArrowArray* array() noexcept { | ||
return &array_; | ||
} | ||
const ArrowArray* array() const noexcept { | ||
return &array_; | ||
} | ||
|
||
ArrowSchema* schema() noexcept { | ||
return &schema_; | ||
} | ||
const ArrowSchema* schema() const noexcept { | ||
return &schema_; | ||
} | ||
|
||
void set_schema(ArrowSchema* schema_src) { | ||
memcpy(&schema_, schema_src, sizeof(struct ArrowSchema)); | ||
schema_src->release = nullptr; | ||
} | ||
|
||
py::tuple return_capsules(py::args args, const py::kwargs& kwargs) { | ||
if ((args.size() > 0) && (!args[0].is_none())) { | ||
throw std::invalid_argument( | ||
"Additional arguments (such as requested_schema) with a non-default value are not " | ||
"supported"); | ||
} | ||
if (kwargs) { | ||
for (auto& item : kwargs) { | ||
if (!item.second.is_none()) { | ||
throw std::invalid_argument( | ||
"Additional arguments (such as requested_schema) with a non-default value " | ||
"are not supported"); | ||
} | ||
} | ||
} | ||
|
||
ArrowArray* c_array = static_cast<ArrowArray*>(malloc(sizeof(ArrowArray))); | ||
ArrowSchema* c_schema = static_cast<ArrowSchema*>(malloc(sizeof(ArrowSchema))); | ||
move(&array_, &schema_, c_array, c_schema); | ||
|
||
constexpr auto array_cleanup = [](void* ptr) noexcept { | ||
auto array = static_cast<ArrowArray*>(ptr); | ||
if (array->release != nullptr) { | ||
array->release(array); | ||
} | ||
free(array); | ||
}; | ||
py::capsule array_capsule{c_array, "arrow_array", array_cleanup}; | ||
|
||
constexpr auto schema_cleanup = [](void* ptr) noexcept { | ||
auto schema = static_cast<ArrowSchema*>(ptr); | ||
if (schema->release != nullptr) { | ||
schema->release(schema); | ||
} | ||
free(schema); | ||
}; | ||
py::capsule schema_capsule{c_schema, "arrow_schema", schema_cleanup}; | ||
|
||
return py::make_tuple(schema_capsule, array_capsule); | ||
} | ||
|
||
void move(ArrowArray* array_src, | ||
ArrowSchema* schema_src, | ||
ArrowArray* array_dst, | ||
ArrowSchema* schema_dst) { | ||
memcpy(array_dst, array_src, sizeof(struct ArrowArray)); | ||
array_src->release = nullptr; | ||
|
||
memcpy(schema_dst, schema_src, sizeof(struct ArrowSchema)); | ||
schema_src->release = nullptr; | ||
} | ||
|
||
/// \brief Call data's release callback if valid | ||
void reset() { | ||
if (array_.release != nullptr) { | ||
array_.release(&array_); | ||
} | ||
|
||
if (schema_.release != nullptr) { | ||
schema_.release(&schema_); | ||
} | ||
} | ||
|
||
/// \brief Call data's release callback if valid and move ownership of the data | ||
/// pointed to by data | ||
void reset(ArrowArray* array_src, ArrowSchema* schema_src) { | ||
reset(); | ||
move(array_src, schema_src, &array_, &schema_); | ||
} | ||
|
||
~ArrowArrayHolder() { | ||
reset(); | ||
} | ||
|
||
protected: | ||
ArrowArray array_; | ||
ArrowSchema schema_; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paleolimbot could you take a quick look at this ArrowArrayHolder
class if that seems to make sense?
It's mostly inspired on the UniqueArray/Schema from nanoarrow.hpp (but then combined in a single class, making it easier to expose in python), but since I only need this (for now), it didn't feel worth vendoring the entire of nanoarrow for the 120 lines of code above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any issues here!
It's also possible to use a std::unique_ptr<>
if you don't feel like maintaining this:
https://github.com/rapidsai/cudf/blob/branch-24.12/cpp/include/cudf/interop.hpp#L121
https://github.com/rapidsai/cudf/blob/branch-24.12/cpp/src/interop/to_arrow_schema.cpp#L227-L230
...then what you have here basically collapses to std::pair<unique_schema_t, unique_array_t>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Now, I still need to implement the "move" logic for the method to create and return the capsules (and the return_capsules
method itself), and that together is already more than half of the code here. So not sure if using std::unique_ptr
would end up that much simpler (and I also need a class to expose that as an object in python)
if (planar) { | ||
auto tol = S1Angle::Radians(tessellate_tolerance / EARTH_RADIUS_METERS); | ||
options.set_tessellate_tolerance(tol); | ||
// TODO make this configurable | ||
// options.set_projection(s2geog::geoarrow::mercator()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbovy we want to make this configurable, so the user can specify one of the pre-defined projections in s2geography (see paleolimbot/s2geography#39 for the options in s2geograhy), but any thoughts on the user API to do this?
We could have a very shallow Projection
class that essentially just holds a S2::Projection, and then some creation functions for the different projections.
For example something with class methods like spherely.Projection.pseudo_mercator()
? Or just functions like spherely.pseudo_mercator_projection()
? (or spherely.projection_pseudo_mercator()
)
(this can also be left for a follow-up PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and R s2
has s2_projection_plate_carree()
, s2_projection_mercator()
, s2_projection_orthographic(centre=(0,0))
, listed on https://r-spatial.github.io/s2/reference/wk_handle.s2_geography.html)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the shallow Projection
class with class methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer keeping proper projection support for a follow-up PR, so for now this one should be ready I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a follow-up PR exposing projections using a shallow spherely.Projection
class -> #69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left a few minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Joris!
Adding geoarrow export (after import was added in #49)
This depends on paleolimbot/s2geography#30 (merged now, but not yet released) and paleolimbot/s2geography#37 (for tesseltation/projection)