Continued consumption with next event after exception was thrown #346

ePaul · 2019-07-17T17:38:38Z

We are using nakadi-java-client 0.9.17.

We've observed a problematic case where we had two events in the same partition.
Due to batch size of 1, they will be both handed to the StreamObserver's onNext() method in separate batches.
The processing of the first event caused an exception (in our code), which was logged by nakadi-java as StreamBatchRecordSubscriber.detected_retryable_exception, without committing any cursor changes.

But then the StreamObserver's onNext() method was called again, with the second event (in a new 1-event batch). (We have the max_uncommitted_events at a higher setting than 1 – the default is 10, I think.) This one could be processed without problems, and our code committed the cursor. As the new cursor was after the first one, we now got both events committed, and Nakadi won't resend either of them. The first failed event is effectively lost now.

This seems not to happen if there is no later event in the partition – then the first event is retried a bit later.

(I didn't succeed to dig into nakadi-java's code to see what is happening when a retryable exception is caught and more events are available on the same partition.)

Is this behavior expected? What should we have done differently?

The text was updated successfully, but these errors were encountered:

dehora · 2019-07-18T11:33:29Z

@ePaul thanks for reporting; let me do some digging, this one might be tricky to debug. In the meantime can you add the stream connection parameters as details?

ePaul · 2019-07-18T14:02:49Z

 nakadiClient
      .resources()
      .streamBuilder()
      .streamConfiguration(
        new StreamConfiguration()
          .subscriptionId(subscriptionId)
          .batchLimit(eventProcessingConfiguration.batchSize())
      )
      .streamObserverFactory(new EventStreamObserverProvider(processingService, eventParser, eventProcessingConfiguration))
      .build()

The batch size here is 50, all other parameters are their default values.

ePaul · 2020-06-05T16:20:26Z

We just got another case of this (internal link).

ePaul · 2020-06-05T16:23:45Z

I guess a workaround would be to always set max_uncommitted_events to 1, but this will reduce the possible throughput quite a lot (no parallelization possible).

dehora added the investigate label Jul 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continued consumption with next event after exception was thrown #346

Continued consumption with next event after exception was thrown #346

ePaul commented Jul 17, 2019 •

edited

Loading

dehora commented Jul 18, 2019

ePaul commented Jul 18, 2019

ePaul commented Jun 5, 2020

ePaul commented Jun 5, 2020

Continued consumption with next event after exception was thrown #346

Continued consumption with next event after exception was thrown #346

Comments

ePaul commented Jul 17, 2019 • edited Loading

dehora commented Jul 18, 2019

ePaul commented Jul 18, 2019

ePaul commented Jun 5, 2020

ePaul commented Jun 5, 2020

ePaul commented Jul 17, 2019 •

edited

Loading