Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index scan cannot read all data #189

Closed
birdstorm opened this issue Dec 29, 2017 · 3 comments
Closed

Index scan cannot read all data #189

birdstorm opened this issue Dec 29, 2017 · 3 comments
Labels

Comments

@birdstorm
Copy link
Contributor

Issue by Novemser
Sun Dec 17 03:15:50 2017
Originally opened as pingcap/tikv-client-lib-java#201


Seems there's some issue in IndexScanIterator.java.
The below sql

scala> spark.sql("select L_ORDERKEY from lineitem where L_ORDERKEY < 10000000 order by l_orderkey").show

should print result as

+----------+
|L_ORDERKEY|
+----------+
|         1|
|         1|
|         1|
|         1|
|         1|
|         1|
|         2|
|         3|
|         3|
|         3|
|         3|
|         3|
|         3|
|         4|
|         5|
|         5|
|         5|
|         6|
|         7|
|         7|
+----------+
only showing top 20 rows

But we got this:

+----------+
|L_ORDERKEY|
+----------+
|    499683|
|    499683|
|    499684|
|    499684|
|    499684|
|    499684|
|    499685|
|    499685|
|    499685|
|    499685|
|    499686|
|    499686|
|    499686|
|    499686|
|    499686|
|    499687|
|    499687|
|    499687|
|    499712|
|    499713|
+----------+
only showing top 20 rows

Plan:

spark.sql("select L_ORDERKEY from lineitem where L_ORDERKEY < 10000000 order by l_orderkey").explain

== Physical Plan ==
*Sort [l_orderkey#18L ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(l_orderkey#18L ASC NULLS FIRST, 200)
   +- TiDB CoprocessorRDD{[table: lineitem] [Index: primary] , Ranges: Start:[1], End: [1], Columns: [L_ORDERKEY], Filter: UnaryNot(IntIsNull([L_ORDERKEY]))}

Code in IndexScanIterator.java

  @Override
  public boolean hasNext() {
    try {
      if (rowIterator == null) {
        TiSession session = snapshot.getSession();
        while (handleIterator.hasNext()) {
          TLongArrayList handles = feedBatch();
          batchCount++;
          completionService.submit(() -> {
            List<RegionTask> tasks = RangeSplitter
                .newSplitter(session.getRegionManager())
                .splitHandlesByRegion(dagReq.getTableInfo().getId(), handles);
            return CoprocessIterator.getRowIterator(dagReq, tasks, session);
          });
        }
        while (batchCount > 0) {
          rowIterator = completionService.take().get();
          batchCount--;

          if (rowIterator.hasNext()) {
            return true;
          }
        }
      }
      if (rowIterator == null) {
        return false;
      }
    } catch (Exception e) {
      throw new TiClientInternalException("Error reading rows from handle", e);
    }
    return rowIterator.hasNext();
  }

Seems rowIterator cannot retrieve all the result from completionService since rowIterator = completionService.take().get(); may not execute when data in first not null iterator ended.

@birdstorm
Copy link
Contributor Author

Comment by Novemser
Sun Dec 17 03:32:00 2017


Solution maybe like

override def hasNext: Boolean = {
  def proceedNextBatchTask(): Boolean = {
    // For each batch fetch job, we get the first rowIterator with row data
    while (batchCount > 0) {
      rowIterator = completionService.take().get()
      batchCount -= 1

      // If current rowIterator has any data, return true
      if (rowIterator.hasNext) {
        return true
      }
    }
    // No rowIterator in any remaining batch fetch jobs contains data, return false
    false
  }
  // RowIter has not been initialized
  if (rowIterator == null) {
    proceedNextBatchTask()
  } else {
    if (rowIterator.hasNext) {
      return true
    }
    proceedNextBatchTask()
  }
}

@Novemser
Copy link
Contributor

Fixed by #140

@Novemser
Copy link
Contributor

Note that IndexScanIterato.java is not used any more in #140, but using IndexScanIterator.java along will still cause the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants