Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal error : Unsupported data type in hasher #821

Closed
NGA-TRAN opened this issue Aug 3, 2021 · 3 comments · Fixed by #812 or #823
Closed

Internal error : Unsupported data type in hasher #821

NGA-TRAN opened this issue Aug 3, 2021 · 3 comments · Fixed by #812 or #823
Assignees
Labels
bug Something isn't working

Comments

@NGA-TRAN
Copy link
Contributor

NGA-TRAN commented Aug 3, 2021

Describe the bug

while running the below SQL, I hit this internal error

Error running remote query: status: Internal, message: "Internal error reading points from database 844910ece80be8bc_3c0bd4c89186ca89: Internal error executing plan: Arrow error: External error: Arrow error: External error: Arrow error: External error: Execution error: Internal error: Unsupported data type in hasher. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Tue, 03 Aug 2021 18:55:08 GMT"} }

To Reproduce
SQL:

SELECT Row_number()
OVER (
partition BY Date_trunc('minute', time), bucket_id, partition_id
ORDER BY time DESC) AS rn,
gauge,
Date_trunc('minute', time) AS time_bucket,
env
FROM storage_usage_bucket_cardinality
WHERE Cast(time AS BIGINT) > Cast(Now() AS BIGINT) - 5 * 60 * 1000000000
AND time <= Now();

Description of table storage_usage_bucket_cardinality

+---------------+--------------+----------------------------------+--------------+-----------------------------+-------------+
| table_catalog | table_schema | table_name                       | column_name  | data_type                   | is_nullable |
+---------------+--------------+----------------------------------+--------------+-----------------------------+-------------+
| public        | iox          | storage_usage_bucket_cardinality | bucket_id    | Dictionary(Int32, Utf8)     | YES         |
| public        | iox          | storage_usage_bucket_cardinality | env          | Dictionary(Int32, Utf8)     | YES         |
| public        | iox          | storage_usage_bucket_cardinality | gauge        | Float64                     | YES         |
| public        | iox          | storage_usage_bucket_cardinality | host         | Dictionary(Int32, Utf8)     | YES         |
| public        | iox          | storage_usage_bucket_cardinality | hostname     | Dictionary(Int32, Utf8)     | YES         |
| public        | iox          | storage_usage_bucket_cardinality | node_id      | Dictionary(Int32, Utf8)     | YES         |
| public        | iox          | storage_usage_bucket_cardinality | nodename     | Dictionary(Int32, Utf8)     | YES         |
| public        | iox          | storage_usage_bucket_cardinality | org_id       | Dictionary(Int32, Utf8)     | YES         |
| public        | iox          | storage_usage_bucket_cardinality | partition_id | Dictionary(Int32, Utf8)     | YES         |
| public        | iox          | storage_usage_bucket_cardinality | role         | Dictionary(Int32, Utf8)     | YES         |
| public        | iox          | storage_usage_bucket_cardinality | time         | Timestamp(Nanosecond, None) | NO          |
| public        | iox          | storage_usage_bucket_cardinality | url          | Dictionary(Int32, Utf8)     | YES         |
+---------------+--------------+----------------------------------+--------------+-----------------------------+-------------+

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@NGA-TRAN NGA-TRAN added the bug Something isn't working label Aug 3, 2021
@Dandandan
Copy link
Contributor

The error message could be improved.

I think it's trying to hash the dictionaries bucket_id and partition_id which misses.an implementation.
There is a PR over here from @alamb to support this.
#812

@alamb alamb self-assigned this Aug 4, 2021
@alamb
Copy link
Contributor

alamb commented Aug 4, 2021

I will figure this one out

@alamb
Copy link
Contributor

alamb commented Aug 4, 2021

Here is a reproducer:

diff --git a/datafusion/tests/sql.rs b/datafusion/tests/sql.rs
index 379cad623..4b8f75191 100644
--- a/datafusion/tests/sql.rs
+++ b/datafusion/tests/sql.rs
@@ -3219,9 +3219,17 @@ async fn query_on_string_dictionary() -> Result<()> {
     let expected = vec![vec!["NULL", "1"], vec!["one", "1"], vec!["three", "1"]];
     assert_eq!(expected, actual);
 
+    // window functions
+    let sql = "SELECT d1, row_number() OVER (partition by d1) FROM test";
+    let mut actual = execute(&mut ctx, sql).await;
+    actual.sort();
+    let expected = vec![vec!["NULL", "1"], vec!["one", "1"], vec!["three", "1"]];
+    assert_eq!(expected, actual);
+
     Ok(())
 }
 
+
 #[tokio::test]
 async fn query_without_from() -> Result<()> {
     // Test for SELECT <expression> without FROM.

Fails with:

thread 'query_on_string_dictionary' panicked at 'Executing physical plan for 'SELECT d1, row_number() OVER (partition by d1) FROM test': ProjectionExec { expr: [(Column { name: "d1", index: 1 }, "d1"), (Column { name: "ROW_NUMBER() PARTITION BY [#test.d1]", index: 0 }, "ROW_NUMBER()")], schema: Schema { fields: [Field { name: "d1", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "ROW_NUMBER()", data_type: UInt64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input: WindowAggExec { input: RepartitionExec { input: SortExec { input: CoalescePartitionsExec { input: CoalesceBatchesExec { input: RepartitionExec { input: RepartitionExec { input: partitions: [...]schema: Schema { fields: [Field { name: "d1", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }projection: Some([0]), partitioning: RoundRobinBatch(16), channels: Mutex { data: {} }, metrics: RepartitionMetrics { fetch_nanos: SQLMetric { value: 0, metric_type: TimeNanos }, repart_nanos: SQLMetric { value: 0, metric_type: TimeNanos }, send_nanos: SQLMetric { value: 0, metric_type: TimeNanos } } }, partitioning: Hash([Column { name: "d1", index: 0 }], 16), channels: Mutex { data: {} }, metrics: RepartitionMetrics { fetch_nanos: SQLMetric { value: 0, metric_type: TimeNanos }, repart_nanos: SQLMetric { value: 0, metric_type: TimeNanos }, send_nanos: SQLMetric { value: 0, metric_type: TimeNanos } } }, target_batch_size: 4096 } }, expr: [PhysicalSortExpr { expr: Column { name: "d1", index: 0 }, options: SortOptions { descending: false, nulls_first: true } }], output_rows: SQLMetric { value: 0, metric_type: Counter }, sort_time_nanos: SQLMetric { value: 0, metric_type: TimeNanos }, preserve_partitioning: false }, partitioning: RoundRobinBatch(16), channels: Mutex { data: {} }, metrics: RepartitionMetrics { fetch_nanos: SQLMetric { value: 0, metric_type: TimeNanos }, repart_nanos: SQLMetric { value: 0, metric_type: TimeNanos }, send_nanos: SQLMetric { value: 0, metric_type: TimeNanos } } }, window_expr: [BuiltInWindowExpr { fun: RowNumber, expr: RowNumber { name: "ROW_NUMBER()" }, partition_by: [Column { name: "d1", index: 0 }], order_by: [], window_frame: None }], schema: Schema { fields: [Field { name: "ROW_NUMBER()", data_type: UInt64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }, Field { name: "d1", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} }, input_schema: Schema { fields: [Field { name: "d1", data_type: Dictionary(Int32, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }], metadata: {} } } }: ArrowError(ExternalError(ArrowError(ExternalError(ArrowError(ExternalError(Execution("Internal error: Unsupported data type in hasher. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker")))))))', datafusion/tests/sql.rs:2642:39

I confirmed that the test passes with the changes in #812. I will merge #812 as is and then add a specific test in a follow on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants