-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer overflow in some case #177
Comments
Spark plan: :- Project [id_dt#0L, tp_bigint#8L]
: +- Filter ((isnotnull(id_dt#0L) && (id_dt#0L > (tp_bigint#8L * 16))) && isnotnull(tp_bigint#8L))
|
I think spark plan generated here may not be appropriate, a :- Project [id_dt#0L, tp_bigint#8L]
: +- Filter (((cast(id_dt#0L as decimal(24,2)) > CheckOverflow((cast(cast(tp_bigint#8L as decimal(20,0)) as decimal(22,2)) * 2.22), DecimalType(24,2))) && isnotnull(id_dt#0L)) && isnotnull(tp_bigint#8L)) Related SQL: select A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on (A.id_dt > B.id_dt * 12.6) where A.tp_bigint = B.id_dt order by A.id_dt |
tispark:
spark:
we missed |
Push it back to spark might solve the problem. Or promote it to larger type and push. But likely this implicit conversion is not supported in TiKV old interface. Anyway, we need a check before push, and fallback if not valid predicates. We have talked through it this afternoon. @birdstorm |
Need to fix after DAG interface. |
Another case: select A.id_dt,A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on A.id_dt > B.id_dt * 16 where A.tp_bigint = B.id_dt order by A.id_dt, B.id_dt Exception: Caused by: com.pingcap.tikv.exception.SelectException: unknown error Overflow
at com.pingcap.tikv.region.RegionStoreClient.coprocessorHelper(RegionStoreClient.java:266) |
This issue is caused by bigint overflow from TiKV computation stage. To prevent this from happening, we could let bigint calculation remains in Spark and don't push it down to TiKV. However, same issue occurs in TiDB and MySQL: select tp_int from full_data_type_table where tp_bigint * 20 > 0 TiDB:
MySQL:
It seems that both of them don't have a fallback path to handle this scenario. But in Spark with JDBC, operation on potential overflow calculation cases will not be pushed down. == Physical Plan ==
*Project [tp_int#84]
+- *Filter ((tp_bigint#80L * 20) > 0)
+- *Scan JDBCRelation(tispark_test.full_data_type_table) [numPartitions=1] [tp_int#84,tp_bigint#80L] PushedFilters: [*IsNotNull(tp_bigint)], ReadSchema: struct<tp_int:int> So here's the question, should we make our behavior consistent with TiDB/MySQL or Spark with JDBC? 🤥 |
This issue is stale because it has been open for 30 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Wed Nov 15 07:27:17 2017
Originally opened as pingcap/tikv-client-lib-java#142
SQL:
Throws:
Seems there's an overflow issue here.
Note that if we remove
* 16
in the sql, the above exception won't be thrown.The text was updated successfully, but these errors were encountered: