Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(logical-types): add NativeType and LogicalType #12853
feat(logical-types): add NativeType and LogicalType #12853
Changes from 2 commits
3827974
17a70d8
3dba963
4217970
01f0089
5b5f4c1
88e1b3c
ab16a2d
0b2ed2d
6150ea9
7ed7891
bdd0155
056a8f1
117f23b
e0923e2
f1ff963
7942f91
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously propose
can_decode_to(DataType) -> bool
, so given logical type and DataType, we can know whether they are paired.How can we do the equivalent check by the current design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given say arrow Int64 data, i want to know whether these is numbers, timestamp, time, date or something else (eg user-defined enum). The fact that any of these hypothetical logical types could be stored as Int64 doesn't help me know. Asking logical type "could you please decode this arrow type?" doesn't help me know.
Thus, going from arrow type to logical type is not an option. We simply need to know what logical type this should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea is that we have LogicalType already. In logical level, they are either LogicalNumber, LogicalTimestamp or LogicalDate, and we can differ them in logical level. They can also decode as i64, i32 in physical level. So asking logical type "could you please decode this arrow type?" is to tell the relationship between logical type and physical type. We don't need to know whether the arrow i64 is number or timestamp, because we already know that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I can follow. @jayzhan211 -- can you write a small practical example? I want to make sure I understand the use case. Thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impl From<DataType> for NativeType
is enough for native type since we can know whether the ArrayRef matches the LogicalType we have. But for LogicalType::UserDefined, I think we need to define what kind ofDataType
it could be decoded to.We can figure this out if we meet any practical usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For any user defined logical type you still know the backing
native
type (via thenative()
method), so you should be able to use the same logic to know if your DataType can represent that logical type.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we implement these traits manually or should we leave to the implementors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. It all comes form Eq for me.
If we implement Eq by name (or name + native), then we disallow implementations from having any attributes that could affect their semantics (ie an implementation may have a field caching something, but it should effectively be a function of the name). And then we can implement other traits for the type as well.
And in fact this makes sense. We need a way to identify a type. We will store this information in the field metadata, so either type needs to be serializable, or it's name needs to be serializable and resolvable. The latter sounds like the way to go, especially given extension types (#12644), which won't be known at compile time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NativeType name might be confusing or we need more docs? I'm reading the code and first comes to my mind Arrow Types or maybe rust native types? I mean the name should be sufficient to understand what kind of native it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think of it as DataFusion’s type, or its built-in LogicalType. The term NativeType aligns with the concept of native types in Rust, so I believe it’s the preferred name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read the structure doc
its is not a Rust native types its more related to Arrow Types which eventually come to rust types. I feel we need to clarify this a little bit, but I cannot come up with the better phrase right now, lets leave it for follow up PR