Index scan cannot read all data #189

birdstorm · 2017-12-29T05:50:54Z

Issue by Novemser
Sun Dec 17 03:15:50 2017
Originally opened as pingcap/tikv-client-lib-java#201

Seems there's some issue in IndexScanIterator.java.
The below sql

scala> spark.sql("select L_ORDERKEY from lineitem where L_ORDERKEY < 10000000 order by l_orderkey").show

should print result as

+----------+
|L_ORDERKEY|
+----------+
|         1|
|         1|
|         1|
|         1|
|         1|
|         1|
|         2|
|         3|
|         3|
|         3|
|         3|
|         3|
|         3|
|         4|
|         5|
|         5|
|         5|
|         6|
|         7|
|         7|
+----------+
only showing top 20 rows

But we got this:

+----------+
|L_ORDERKEY|
+----------+
|    499683|
|    499683|
|    499684|
|    499684|
|    499684|
|    499684|
|    499685|
|    499685|
|    499685|
|    499685|
|    499686|
|    499686|
|    499686|
|    499686|
|    499686|
|    499687|
|    499687|
|    499687|
|    499712|
|    499713|
+----------+
only showing top 20 rows

Plan:

spark.sql("select L_ORDERKEY from lineitem where L_ORDERKEY < 10000000 order by l_orderkey").explain

== Physical Plan ==
*Sort [l_orderkey#18L ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(l_orderkey#18L ASC NULLS FIRST, 200)
   +- TiDB CoprocessorRDD{[table: lineitem] [Index: primary] , Ranges: Start:[1], End: [1], Columns: [L_ORDERKEY], Filter: UnaryNot(IntIsNull([L_ORDERKEY]))}

Code in IndexScanIterator.java

  @Override
  public boolean hasNext() {
    try {
      if (rowIterator == null) {
        TiSession session = snapshot.getSession();
        while (handleIterator.hasNext()) {
          TLongArrayList handles = feedBatch();
          batchCount++;
          completionService.submit(() -> {
            List<RegionTask> tasks = RangeSplitter
                .newSplitter(session.getRegionManager())
                .splitHandlesByRegion(dagReq.getTableInfo().getId(), handles);
            return CoprocessIterator.getRowIterator(dagReq, tasks, session);
          });
        }
        while (batchCount > 0) {
          rowIterator = completionService.take().get();
          batchCount--;

          if (rowIterator.hasNext()) {
            return true;
          }
        }
      }
      if (rowIterator == null) {
        return false;
      }
    } catch (Exception e) {
      throw new TiClientInternalException("Error reading rows from handle", e);
    }
    return rowIterator.hasNext();
  }

Seems rowIterator cannot retrieve all the result from completionService since rowIterator = completionService.take().get(); may not execute when data in first not null iterator ended.

The text was updated successfully, but these errors were encountered:

birdstorm · 2017-12-29T05:50:56Z

Comment by Novemser
Sun Dec 17 03:32:00 2017

Solution maybe like

override def hasNext: Boolean = {
  def proceedNextBatchTask(): Boolean = {
    // For each batch fetch job, we get the first rowIterator with row data
    while (batchCount > 0) {
      rowIterator = completionService.take().get()
      batchCount -= 1

      // If current rowIterator has any data, return true
      if (rowIterator.hasNext) {
        return true
      }
    }
    // No rowIterator in any remaining batch fetch jobs contains data, return false
    false
  }
  // RowIter has not been initialized
  if (rowIterator == null) {
    proceedNextBatchTask()
  } else {
    if (rowIterator.hasNext) {
      return true
    }
    proceedNextBatchTask()
  }
}

Novemser · 2017-12-30T03:18:26Z

Fixed by #140

Novemser · 2017-12-30T03:19:54Z

Note that IndexScanIterato.java is not used any more in #140, but using IndexScanIterator.java along will still cause the problem.

birdstorm added the type/bug label Dec 29, 2017

Novemser closed this as completed Dec 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index scan cannot read all data #189

Index scan cannot read all data #189

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

Novemser commented Dec 30, 2017

Novemser commented Dec 30, 2017

Index scan cannot read all data #189

Index scan cannot read all data #189

Comments

birdstorm commented Dec 29, 2017

birdstorm commented Dec 29, 2017

Novemser commented Dec 30, 2017

Novemser commented Dec 30, 2017