Integer overflow in some case #177

birdstorm · 2017-12-29T05:49:28Z

Issue by Novemser
Wed Nov 15 07:27:17 2017
Originally opened as pingcap/tikv-client-lib-java#142

SQL:

select A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on A.id_dt > B.id_dt * 16 where A.tp_bigint = B.id_dt order by A.id_dt

Throws:

Caused by: com.pingcap.tikv.exception.TiClientInternalException: Error reading region
  at com.pingcap.tikv.operation.SelectIterator.readNextRegion(SelectIterator.java:148)
  at com.pingcap.tikv.operation.SelectIterator.hasNext(SelectIterator.java:161)
  at org.apache.spark.sql.tispark.TiRDD$$anon$2.hasNext(TiRDD.scala:75)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
  at org.apache.spark.scheduler.Task.run(Task.scala:99)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: com.pingcap.tikv.exception.SelectException: unknown error Codec(Other(StringError("I64(4355836469450447576) * I64(16) overflow")))
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  at com.pingcap.tikv.operation.SelectIterator.readNextRegion(SelectIterator.java:145)
  ... 13 more
Caused by: com.pingcap.tikv.exception.SelectException: unknown error Codec(Other(StringError("I64(4355836469450447576) * I64(16) overflow")))
  at com.pingcap.tikv.region.RegionStoreClient.coprocessorHelper(RegionStoreClient.java:192)
  at com.pingcap.tikv.region.RegionStoreClient.coprocess(RegionStoreClient.java:185)
  at com.pingcap.tikv.operation.SelectIterator.createClientAndSendReq(SelectIterator.java:130)
  at com.pingcap.tikv.operation.SelectIterator.lambda$submitTasks$2(SelectIterator.java:113)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  ... 3 more

Seems there's an overflow issue here.

Note that if we remove * 16 in the sql, the above exception won't be thrown.

The text was updated successfully, but these errors were encountered:

birdstorm · 2017-12-29T05:49:31Z

Comment by Novemser
Wed Nov 15 11:38:35 2017

Spark plan:

   :- Project [id_dt#0L, tp_bigint#8L]
   :  +- Filter ((isnotnull(id_dt#0L) && (id_dt#0L > (tp_bigint#8L * 16))) && isnotnull(tp_bigint#8L))

tp_bigint#8L * 16 may definitely cause an overflow issue, but we did't validate this filter, pushed it down to TiKV and caused the above problem.

birdstorm · 2017-12-29T05:49:34Z

Comment by Novemser
Wed Nov 15 12:16:18 2017

I think spark plan generated here may not be appropriate, a CheckOverflow might have been added to the above filter like the following plan:

   :- Project [id_dt#0L, tp_bigint#8L]
   :  +- Filter (((cast(id_dt#0L as decimal(24,2)) > CheckOverflow((cast(cast(tp_bigint#8L as decimal(20,0)) as decimal(22,2)) * 2.22), DecimalType(24,2))) && isnotnull(id_dt#0L)) && isnotnull(tp_bigint#8L))

Related SQL:

select A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on (A.id_dt > B.id_dt * 12.6) where A.tp_bigint = B.id_dt order by A.id_dt

birdstorm · 2017-12-29T05:49:37Z

Comment by birdstorm
Wed Nov 15 12:58:31 2017

tispark:

scala> testsql.explain
== Physical Plan ==
*Project [id_bigint#1L, id_int#26L]
+- *Sort [id_int#0L ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(id_int#0L ASC NULLS FIRST, 200)
      +- *Project [id_bigint#1L, id_int#26L, id_int#0L]
         +- *SortMergeJoin [id_bigint#1L], [id_int#26L], Inner, (id_int#0L > (id_int#26L * 2))
            :- *Sort [id_bigint#1L ASC NULLS FIRST], false, 0
            :  +- Exchange hashpartitioning(id_bigint#1L, 200)
            :     +- TiDB CoprocessorRDD{
 Table: a
 Ranges: Start:[-9223372036854775808], End: [9223372036854775807]
 Columns: [id_int], [id_bigint]
 Filter: Not(IsNull([id_int])), Not(IsNull([id_bigint])), ([id_int] > ([id_bigint] Multiply 2))
}
            +- *Sort [id_int#26L ASC NULLS FIRST], false, 0
               +- Exchange hashpartitioning(id_int#26L, 200)
                  +- TiDB CoprocessorRDD{
 Table: a
 Ranges: Start:[-9223372036854775808], End: [9223372036854775807]
 Columns: [id_int]
 Filter: Not(IsNull([id_int]))
}

spark:

scala> testsql.explain
== Physical Plan ==
*Project [id_bigint#1L, id_int#50]
+- *Sort [id_int#0 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(id_int#0 ASC NULLS FIRST, 200)
      +- *Project [id_bigint#1L, id_int#50, id_int#0]
         +- *SortMergeJoin [id_bigint#1L], [cast(id_int#50 as bigint)], Inner, (id_int#0 > (id_int#50 * 2))
            :- *Sort [id_bigint#1L ASC NULLS FIRST], false, 0
            :  +- Exchange hashpartitioning(id_bigint#1L, 200)
            :     +- *Scan JDBCRelation(a) [numPartitions=1] [id_int#0,id_bigint#1L] PushedFilters: [*IsNotNull(id_int), *IsNotNull(id_bigint)], ReadSchema: struct<id_int:int,id_bigint:bigint>
            +- *Sort [cast(id_int#50 as bigint) ASC NULLS FIRST], false, 0
               +- Exchange hashpartitioning(cast(id_int#50 as bigint), 200)
                  +- *Scan JDBCRelation(a) [numPartitions=1] [id_int#50] PushedFilters: [*IsNotNull(id_int)], ReadSchema: struct<id_int:int>

we missed cast(id_int#50 as bigint) inside SortMergeJoin, not CheckOverflow(). @Novemser

birdstorm · 2017-12-29T05:49:40Z

Comment by ilovesoup
Wed Nov 15 14:47:56 2017

Push it back to spark might solve the problem. Or promote it to larger type and push. But likely this implicit conversion is not supported in TiKV old interface. Anyway, we need a check before push, and fallback if not valid predicates. We have talked through it this afternoon. @birdstorm

birdstorm · 2017-12-29T05:49:44Z

Comment by ilovesoup
Tue Nov 21 17:57:59 2017

Need to fix after DAG interface.

birdstorm · 2017-12-29T05:49:47Z

Comment by Novemser
Fri Dec 1 04:31:02 2017

Another case:

select A.id_dt,A.tp_bigint,B.id_dt from full_data_type_table A join full_data_type_table B on A.id_dt > B.id_dt * 16 where A.tp_bigint = B.id_dt order by A.id_dt, B.id_dt

Exception:

Caused by: com.pingcap.tikv.exception.SelectException: unknown error Overflow
	at com.pingcap.tikv.region.RegionStoreClient.coprocessorHelper(RegionStoreClient.java:266)

birdstorm · 2017-12-29T05:49:50Z

Comment by Novemser
Fri Dec 8 08:08:05 2017

This issue is caused by bigint overflow from TiKV computation stage. To prevent this from happening, we could let bigint calculation remains in Spark and don't push it down to TiKV.

However, same issue occurs in TiDB and MySQL:
SQL:

select tp_int from full_data_type_table where tp_bigint * 20 > 0

TiDB:

ERROR 1105 (HY000): other error: unknown error Overflow

MySQL:

ERROR 1690 (22003): BIGINT value is out of range in '(`tispark_test`.`full_data_type_table`.`tp_bigint` * 20)'

It seems that both of them don't have a fallback path to handle this scenario.

But in Spark with JDBC, operation on potential overflow calculation cases will not be pushed down.
Like this:

== Physical Plan ==
*Project [tp_int#84]
+- *Filter ((tp_bigint#80L * 20) > 0)
   +- *Scan JDBCRelation(tispark_test.full_data_type_table) [numPartitions=1] [tp_int#84,tp_bigint#80L] PushedFilters: [*IsNotNull(tp_bigint)], ReadSchema: struct<tp_int:int>

So here's the question, should we make our behavior consistent with TiDB/MySQL or Spark with JDBC? 🤥

github-actions · 2022-05-08T02:15:22Z

This issue is stale because it has been open for 30 days with no activity.

github-actions · 2022-05-24T02:15:39Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

birdstorm added the priority/P1 label Dec 29, 2017

marsishandsome added type/enhancement type/bug and removed priority/P1 type/enhancement labels Nov 25, 2020

marsishandsome assigned birdstorm Dec 1, 2020

github-actions bot added the stale label May 8, 2022

github-actions bot added the close for stale label May 24, 2022

github-actions bot closed this as completed May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integer overflow in some case #177

Integer overflow in some case #177

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

github-actions bot commented May 8, 2022

github-actions bot commented May 24, 2022

Integer overflow in some case #177

Integer overflow in some case #177

Comments

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

github-actions bot commented May 8, 2022

github-actions bot commented May 24, 2022