Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish the Attribute API wrapping #6

Merged
merged 3 commits into from
Mar 19, 2024
Merged

Finish the Attribute API wrapping #6

merged 3 commits into from
Mar 19, 2024

Conversation

davisp
Copy link
Collaborator

@davisp davisp commented Mar 18, 2024

The major change here is to reimplement the DomainType as a new CAPIConverter trait along with a generic method on tiledb_sys::Datatype.

@davisp davisp marked this pull request as draft March 18, 2024 21:47
@davisp davisp requested a review from rroelke March 18, 2024 21:47
@davisp davisp force-pushed the pd/finish-attribute-api branch 3 times, most recently from f6539b0 to 37d6222 Compare March 18, 2024 21:55
Copy link
Member

@rroelke rroelke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments are not imminently actionable so I'm choosing to approve rather than request changes. But I think we definitely want the test improvement sooner rather than later.


pub fn get_fill_value<Conv: CAPIConverter>(
&self,
ctx: &Context,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I have a branch which is stylizing this to be like the other things, i.e. &'ctx Context is a member of the struct. Let's merge this one first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I'm gonna write that quick test to see what can be done with TypeId for slices to pick a replacement name for Datatype::is_valid method and then will update and merge once that's done.

let mut c_size: u64 = 0;

let res = unsafe {
ffi::tiledb_attribute_get_fill_value(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Philosophical question time. Here's the implementation that I see of tiledb_attribute_get_fill_value:

  ensure_attribute_is_valid(attr);
  api::ensure_output_pointer_is_valid(value);
  api::ensure_output_pointer_is_valid(size);
  attr->get_fill_value(value, size);
  return TILEDB_OK;

Looks like get_fill_value throws an exception if the size is zero, the value is null, or the attribute is not nullable.

The first two of these are effectively prevented by using Rust. The third one seems to me like it should really be an option type, not an error.

Anyway what I'm kind of postulating is whether getter functions like this should really return Result since safe Rust should really be preventing the usages where this would be an error (and maybe for this specific scenario of a nullable attribute it should be Option<u64>).

It would be a bit more user-friendly... but I guess this is probably being called alongside a bunch of other Result code anyway. And the C implementation could always evolve. So I've answered my own question. There are some other functions we should probably fix up to match.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to go too far Rust-ifying our interface until more of the library is wrapped so that we have a better feel for common patterns, but I wonder if we couldn't do something along the lines of having a specific type for the value would be the way to go. Something along the lines of:

enum FillValue {
    NonNullable(T),
    Nullable(T, B)
}

fun set_fill_value(value: FillValue);
fun get_fill_value() -> FillValule;

@@ -0,0 +1,42 @@
pub trait CAPIConverter {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure this trait gets it right. I think we need to play with strings to find a trait that does. Which is to say, this isn't really any different than what was here before, and I believe it will have to evolve again in the near future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I said this in slack already but we will need to distinguish between types which are trivially convertible (as each of this examples is) and types which are not (likely string). For the trivially convertible cases, we don't want any functions at all, just a marker - and then for the non-trivial cases we will end up with implementations that are more complicated than this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion stuff isn't any different for sure. The issue was I focusing on was removing the Datatype dependence of the conversion. With get_fill_value I ran smack into the issue when I realized that DomainType was wrong because u64 would only have one associated Datatype via DomainType::DATATTYPE which was wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that makes sense. Yeah multiple datatypes backed by u64. Got it.

@@ -17,6 +18,7 @@ extern "C" {
pub fn tiledb_datatype_size(type_: u32) -> u64;
}

#[allow(non_snake_case)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bigger change but I kind of think this should change to be more similar to the smaller enums I've worked through - i.e. the sys library just has the constants and then we put the enum in the main crate. We shouldn't need this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I'm not even sure why it started complaining when it was there before the PR. The linter disable was purely to tell clippy to quiet up so I didn't have to include tugging on this loose end.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does clippy run on all directories? If not maybe that's why it didn't catch this.

@@ -207,6 +209,96 @@ impl Datatype {
_ => None,
}
}

pub fn is_valid<T: 'static>(&self, _: &T) -> bool {
if TypeId::of::<T>() == TypeId::of::<f32>() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a tuple-type match (self, TypeId::of::<T>()) could probably neaten this a bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean here. Self would be the enum value in this case which is just a variant of an enum rather than the data being converted.

@@ -207,6 +209,96 @@ impl Datatype {
_ => None,
}
}

pub fn is_valid<T: 'static>(&self, _: &T) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"valid" meaning "T is a physical type which implements this logical type"?
I'd probably name it is_physical_type or at the least give put the "physical type" terminology in a doc comment

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some vague ideas on how this would work with slices that we'll need for strings and multi-value cells. I'll spend a bit this morning setting up an example to see if I can't decide whether we'll be able to use this same pattern for everything or if we'll have to get more complicated.

Which is to say, I'm not 100% that this will only work for physical types. Though I could see renaming to is_compatible_type or something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimental results show that we should be able to apply this same technique to non-physical types as well. According to the TypeId docs, it currently only works for types that are 'static which I don't even know how it a type wouldn't be static yet. I'll go with is_compatible_type for now unless you throw something at me before I finish updating and merging.

format!("{}", dt),
"<UNKNOWN DATA TYPE>".to_string()
);
assert!(check_valid(&dt));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the assertion that only one physical type is valid for each logical type but the way this is written leaves a lot of room for failure if the wrong one is chosen. I think it's probably worth the map from logical to physical type (which we probably want somewhere anyway?) and then having the check_valid instead return a vector representing which types were valid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure. The only guarantee that we've picked the right type is to have the second map but I didn't have the brain power to figure out how best to do that in Rust. I'm more than open to adding it if you've got any idea on how to write it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to check out this branch and give it a whirl, or do so after you've merged

}

#[test]
fn attribute_test_set_fill_value() -> TileDBResult<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will definitely want string tests soon but I think we can skip for now - I left a related comment in convert.rs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, strings and multi-value cells are a looming monster on the horizon that I've mostly been ignoring while dealing with trying to cover quite a bit of the API to get a feel for anything that might be a problem. I've just not wanted to sink the time into figuring out that particular fight while I try and make sure that the more mundane things are all going to work out reasonably well.

Although, theoretically, we'll only need to solve that issue once and then re-uses it ~everywhere that we've got the whole void*/uint64_t* pattern in the C API.

davisp added 2 commits March 19, 2024 10:48
This implements a different approach to converting between types and
Datatype dependent type checks.

The main idea here that's different than previously is that the
DomainType enforced a 1:1 correspondence between TileDB's Datatype
enumeration and the Rust type system. Unfortunately, multiple TileDB
Datatypes map to the same Rust type. The idea here is to separate the
conversion of types and the validation that a given Rust type is valid
for a given TileDB Datatype.

The basic idea is to move type conversion between Rust and FFI types
(i.e., i32 -> uint32_t) to be a separate concern from "Is an i32 valid
as a Datatype::Blob". The type conversions basically map directly to how
DomainType worked. The validation is moved to a generic method on
Datatype in tiledb-sys that makes heavy use of `std::any::TypeId` to
restrict the generic type paramter to specific Datatype enum variants.
@davisp davisp force-pushed the pd/finish-attribute-api branch from 4804ae0 to d083dae Compare March 19, 2024 15:48
@davisp davisp marked this pull request as ready for review March 19, 2024 15:49
@davisp davisp merged commit eec14d8 into main Mar 19, 2024
1 check passed
@davisp davisp deleted the pd/finish-attribute-api branch March 19, 2024 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants