Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer overflow in some case #177

Closed
birdstorm opened this issue Dec 29, 2017 · 9 comments
Closed

Integer overflow in some case #177

birdstorm opened this issue Dec 29, 2017 · 9 comments

Comments

@birdstorm
Copy link
Contributor

Issue by Novemser
Wed Nov 15 07:27:17 2017
Originally opened as pingcap/tikv-client-lib-java#142


SQL:

select A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on A.id_dt > B.id_dt * 16 where A.tp_bigint = B.id_dt order by A.id_dt

Throws:

Caused by: com.pingcap.tikv.exception.TiClientInternalException: Error reading region
  at com.pingcap.tikv.operation.SelectIterator.readNextRegion(SelectIterator.java:148)
  at com.pingcap.tikv.operation.SelectIterator.hasNext(SelectIterator.java:161)
  at org.apache.spark.sql.tispark.TiRDD$$anon$2.hasNext(TiRDD.scala:75)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
  at org.apache.spark.scheduler.Task.run(Task.scala:99)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: com.pingcap.tikv.exception.SelectException: unknown error Codec(Other(StringError("I64(4355836469450447576) * I64(16) overflow")))
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  at com.pingcap.tikv.operation.SelectIterator.readNextRegion(SelectIterator.java:145)
  ... 13 more
Caused by: com.pingcap.tikv.exception.SelectException: unknown error Codec(Other(StringError("I64(4355836469450447576) * I64(16) overflow")))
  at com.pingcap.tikv.region.RegionStoreClient.coprocessorHelper(RegionStoreClient.java:192)
  at com.pingcap.tikv.region.RegionStoreClient.coprocess(RegionStoreClient.java:185)
  at com.pingcap.tikv.operation.SelectIterator.createClientAndSendReq(SelectIterator.java:130)
  at com.pingcap.tikv.operation.SelectIterator.lambda$submitTasks$2(SelectIterator.java:113)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  ... 3 more

Seems there's an overflow issue here.

Note that if we remove * 16 in the sql, the above exception won't be thrown.

@birdstorm
Copy link
Contributor Author

Comment by Novemser
Wed Nov 15 11:38:35 2017


Spark plan:

   :- Project [id_dt#0L, tp_bigint#8L]
   :  +- Filter ((isnotnull(id_dt#0L) && (id_dt#0L > (tp_bigint#8L * 16))) && isnotnull(tp_bigint#8L))

tp_bigint#8L * 16 may definitely cause an overflow issue, but we did't validate this filter, pushed it down to TiKV and caused the above problem.

@birdstorm
Copy link
Contributor Author

Comment by Novemser
Wed Nov 15 12:16:18 2017


I think spark plan generated here may not be appropriate, a CheckOverflow might have been added to the above filter like the following plan:

   :- Project [id_dt#0L, tp_bigint#8L]
   :  +- Filter (((cast(id_dt#0L as decimal(24,2)) > CheckOverflow((cast(cast(tp_bigint#8L as decimal(20,0)) as decimal(22,2)) * 2.22), DecimalType(24,2))) && isnotnull(id_dt#0L)) && isnotnull(tp_bigint#8L))

Related SQL:

select A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on (A.id_dt > B.id_dt * 12.6) where A.tp_bigint = B.id_dt order by A.id_dt

@birdstorm
Copy link
Contributor Author

Comment by birdstorm
Wed Nov 15 12:58:31 2017


tispark:

scala> testsql.explain
== Physical Plan ==
*Project [id_bigint#1L, id_int#26L]
+- *Sort [id_int#0L ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(id_int#0L ASC NULLS FIRST, 200)
      +- *Project [id_bigint#1L, id_int#26L, id_int#0L]
         +- *SortMergeJoin [id_bigint#1L], [id_int#26L], Inner, (id_int#0L > (id_int#26L * 2))
            :- *Sort [id_bigint#1L ASC NULLS FIRST], false, 0
            :  +- Exchange hashpartitioning(id_bigint#1L, 200)
            :     +- TiDB CoprocessorRDD{
 Table: a
 Ranges: Start:[-9223372036854775808], End: [9223372036854775807]
 Columns: [id_int], [id_bigint]
 Filter: Not(IsNull([id_int])), Not(IsNull([id_bigint])), ([id_int] > ([id_bigint] Multiply 2))
}
            +- *Sort [id_int#26L ASC NULLS FIRST], false, 0
               +- Exchange hashpartitioning(id_int#26L, 200)
                  +- TiDB CoprocessorRDD{
 Table: a
 Ranges: Start:[-9223372036854775808], End: [9223372036854775807]
 Columns: [id_int]
 Filter: Not(IsNull([id_int]))
}

spark:

scala> testsql.explain
== Physical Plan ==
*Project [id_bigint#1L, id_int#50]
+- *Sort [id_int#0 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(id_int#0 ASC NULLS FIRST, 200)
      +- *Project [id_bigint#1L, id_int#50, id_int#0]
         +- *SortMergeJoin [id_bigint#1L], [cast(id_int#50 as bigint)], Inner, (id_int#0 > (id_int#50 * 2))
            :- *Sort [id_bigint#1L ASC NULLS FIRST], false, 0
            :  +- Exchange hashpartitioning(id_bigint#1L, 200)
            :     +- *Scan JDBCRelation(a) [numPartitions=1] [id_int#0,id_bigint#1L] PushedFilters: [*IsNotNull(id_int), *IsNotNull(id_bigint)], ReadSchema: struct<id_int:int,id_bigint:bigint>
            +- *Sort [cast(id_int#50 as bigint) ASC NULLS FIRST], false, 0
               +- Exchange hashpartitioning(cast(id_int#50 as bigint), 200)
                  +- *Scan JDBCRelation(a) [numPartitions=1] [id_int#50] PushedFilters: [*IsNotNull(id_int)], ReadSchema: struct<id_int:int>

we missed cast(id_int#50 as bigint) inside SortMergeJoin, not CheckOverflow(). @Novemser

@birdstorm
Copy link
Contributor Author

Comment by ilovesoup
Wed Nov 15 14:47:56 2017


Push it back to spark might solve the problem. Or promote it to larger type and push. But likely this implicit conversion is not supported in TiKV old interface. Anyway, we need a check before push, and fallback if not valid predicates. We have talked through it this afternoon. @birdstorm

@birdstorm
Copy link
Contributor Author

Comment by ilovesoup
Tue Nov 21 17:57:59 2017


Need to fix after DAG interface.

@birdstorm
Copy link
Contributor Author

Comment by Novemser
Fri Dec 1 04:31:02 2017


Another case:

select A.id_dt,A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on A.id_dt > B.id_dt * 16 where A.tp_bigint = B.id_dt order by A.id_dt, B.id_dt 

Exception:

Caused by: com.pingcap.tikv.exception.SelectException: unknown error Overflow
	at com.pingcap.tikv.region.RegionStoreClient.coprocessorHelper(RegionStoreClient.java:266)

@birdstorm
Copy link
Contributor Author

Comment by Novemser
Fri Dec 8 08:08:05 2017


This issue is caused by bigint overflow from TiKV computation stage. To prevent this from happening, we could let bigint calculation remains in Spark and don't push it down to TiKV.

However, same issue occurs in TiDB and MySQL:
SQL:

select tp_int from full_data_type_table where tp_bigint * 20 > 0

TiDB:

ERROR 1105 (HY000): other error: unknown error Overflow

MySQL:

ERROR 1690 (22003): BIGINT value is out of range in '(`tispark_test`.`full_data_type_table`.`tp_bigint` * 20)'

It seems that both of them don't have a fallback path to handle this scenario.

But in Spark with JDBC, operation on potential overflow calculation cases will not be pushed down.
Like this:

== Physical Plan ==
*Project [tp_int#84]
+- *Filter ((tp_bigint#80L * 20) > 0)
   +- *Scan JDBCRelation(tispark_test.full_data_type_table) [numPartitions=1] [tp_int#84,tp_bigint#80L] PushedFilters: [*IsNotNull(tp_bigint)], ReadSchema: struct<tp_int:int>

So here's the question, should we make our behavior consistent with TiDB/MySQL or Spark with JDBC? 🤥

@github-actions
Copy link

github-actions bot commented May 8, 2022

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label May 8, 2022
@github-actions
Copy link

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants