Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection leaks on native Azure filesystem #24116

Open
cccs-nik opened this issue Nov 12, 2024 · 13 comments
Open

Connection leaks on native Azure filesystem #24116

cccs-nik opened this issue Nov 12, 2024 · 13 comments

Comments

@cccs-nik
Copy link
Member

Since switching to the new Azure native FS in our environments we've been seeing a lot of connection leaks and java.lang.IllegalStateException: Unbalanced enter/exit exceptions. The native FS seemed to cause stability issues in earlier versions of Trino (like in Trino 452) so we had switched off back to legacy but in version 460 where it's required it seems ok. I'm not sure if our past stability issues in other versions were unrelated to the leaks or they're now mitigated due to other changes.

We get the following exceptions when running most queries:

ERROR	SplitRunner-20241025_163311_00004_872jn.1.0.0-4-283	reactor.core.publisher.Operators	Operator called default onErrorDropped
java.lang.IllegalStateException: Unbalanced enter/exit
	at okio.AsyncTimeout.enter(AsyncTimeout.kt:58)
	at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:384)
	at okio.RealBufferedSource.read(RealBufferedSource.kt:194)
	at okhttp3.internal.http1.Http1ExchangeCodec$AbstractSource.read(Http1ExchangeCodec.kt:339)
	at okhttp3.internal.http1.Http1ExchangeCodec$FixedLengthSource.read(Http1ExchangeCodec.kt:376)
	at okhttp3.internal.Util.skipAll(Util.kt:344)
	at okhttp3.internal.Util.discard(Util.kt:365)
	at okhttp3.internal.http1.Http1ExchangeCodec$FixedLengthSource.close(Http1ExchangeCodec.kt:395)
	at okio.ForwardingSource.close(ForwardingSource.kt:32)
	at okhttp3.internal.connection.Exchange$ResponseBodySource.close(Exchange.kt:314)
	at okio.RealBufferedSource.close(RealBufferedSource.kt:486)
	at okhttp3.internal.Util.closeQuietly(Util.kt:495)
	at okhttp3.ResponseBody.close(ResponseBody.kt:192)
	at com.azure.core.http.okhttp.implementation.OkHttpAsyncResponse.close(OkHttpAsyncResponse.java:119)
	at com.azure.core.http.okhttp.implementation.OkHttpAsyncResponse.lambda$getBody$1(OkHttpAsyncResponse.java:61)
	at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.cleanup(FluxUsing.java:334)
	at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.cancel(FluxUsing.java:328)
	at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.cancel(FluxUsing.java:326)
	at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.cancel(FluxMapFuseable.java:176)
	at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2425)
	at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2393)
	at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2205)
	at reactor.core.publisher.Operators$BaseFluxToMonoOperator.cancel(Operators.java:2086)
	at reactor.core.publisher.FluxDefaultIfEmpty$DefaultIfEmptySubscriber.cancel(FluxDefaultIfEmpty.java:103)
	at reactor.core.publisher.Operators$BaseFluxToMonoOperator.cancel(Operators.java:2086)
	at reactor.core.publisher.MonoCollect$CollectSubscriber.cancel(MonoCollect.java:150)
	at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.cancel(FluxMapFuseable.java:176)
	at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.cancel(FluxMapFuseable.java:176)
	at reactor.core.publisher.Operators.terminate(Operators.java:1328)
	at reactor.core.publisher.MonoZip$ZipInner.cancel(MonoZip.java:548)
	at reactor.core.publisher.MonoZip$ZipCoordinator.cancel(MonoZip.java:313)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.cancel(MonoFlatMap.java:207)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.cancel(MonoFlatMap.java:199)
	at reactor.core.publisher.BlockingSingleSubscriber.dispose(BlockingSingleSubscriber.java:74)
	at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:94)
	at reactor.core.publisher.Mono.block(Mono.java:1779)
	at com.azure.storage.blob.specialized.BlobClientBase.openInputStream(BlobClientBase.java:394)
	at com.azure.storage.blob.specialized.BlobClientBase.openInputStream(BlobClientBase.java:324)
	at io.trino.filesystem.azure.AzureInput.readFully(AzureInput.java:61)
	at io.trino.filesystem.tracing.TracingInput.lambda$readFully$0(TracingInput.java:53)
	at io.trino.filesystem.tracing.Tracing.lambda$withTracing$1(Tracing.java:38)
	at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47)
	at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:37)
	at io.trino.filesystem.tracing.TracingInput.readFully(TracingInput.java:53)
	at io.trino.plugin.hive.parquet.TrinoParquetDataSource.readInternal(TrinoParquetDataSource.java:64)
	at io.trino.parquet.AbstractParquetDataSource.readFully(AbstractParquetDataSource.java:122)
	at io.trino.parquet.AbstractParquetDataSource$ReferenceCountedReader.read(AbstractParquetDataSource.java:332)
	at io.trino.parquet.ChunkReader.readUnchecked(ChunkReader.java:31)
	at io.trino.parquet.reader.ChunkedInputStream.readNextChunk(ChunkedInputStream.java:149)
	at io.trino.parquet.reader.ChunkedInputStream.read(ChunkedInputStream.java:93)
	at shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:170)
	at shaded.parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:100)
	at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:632)
	at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:532)
	at org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:188)
	at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1003)
	at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:995)
	at org.apache.parquet.format.PageHeader.read(PageHeader.java:870)
	at org.apache.parquet.format.Util.read(Util.java:390)
	at org.apache.parquet.format.Util.readPageHeader(Util.java:133)
	at org.apache.parquet.format.Util.readPageHeader(Util.java:128)
	at io.trino.parquet.reader.ParquetColumnChunkIterator.readPageHeader(ParquetColumnChunkIterator.java:116)
	at io.trino.parquet.reader.ParquetColumnChunkIterator.next(ParquetColumnChunkIterator.java:83)
	at io.trino.parquet.reader.ParquetColumnChunkIterator.next(ParquetColumnChunkIterator.java:42)
	at com.google.common.collect.Iterators$PeekingImpl.peek(Iterators.java:1219)
	at io.trino.parquet.reader.PageReader.readDictionaryPage(PageReader.java:160)
	at io.trino.parquet.reader.AbstractColumnReader.setPageReader(AbstractColumnReader.java:79)
	at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:460)
	at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:543)
	at io.trino.parquet.reader.ParquetReader.readStruct(ParquetReader.java:373)
	at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:534)
	at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:526)
	at io.trino.parquet.reader.ParquetReader.lambda$nextPage$3(ParquetReader.java:252)
	at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:72)
	at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:312)
	at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:291)
	at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:186)
	at io.trino.spi.Page.getLoadedPage(Page.java:244)
	at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:271)
	at io.trino.operator.Driver.processInternal(Driver.java:403)
	at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
	at io.trino.operator.Driver.tryWithLock(Driver.java:709)
	at io.trino.operator.Driver.process(Driver.java:298)
	at io.trino.operator.Driver.processForDuration(Driver.java:269)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
	at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:201)
	at io.trino.$gen.Trino_460_220_g1e499ed____20241025_161104_2.run(Unknown Source)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:202)
	at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:172)
	at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:159)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1575)

Followed by connection leak warnings:

WARN OkHttp ConnectionPool okhttp3.OkHttpClient A connection to https://<domain>.blob.core.windows.net/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);

And I believe we get the following stack traces as a result of turning logging up:

WARN OkHttp ConnectionPool okhttp3.OkHttpClient	A connection to https://<domain>.blob.core.windows.net/ was leaked. Did you forget to close a response body?
java.lang.Throwable: response.body().close()
	at okhttp3.internal.platform.Platform.getStackTraceForCloseable(Platform.kt:145)
	at okhttp3.internal.connection.RealCall.callStart(RealCall.kt:170)
	at okhttp3.internal.connection.RealCall.enqueue(RealCall.kt:163)
	at com.azure.core.http.okhttp.OkHttpAsyncHttpClient.lambda$send$2(OkHttpAsyncHttpClient.java:125)
	at reactor.core.publisher.LambdaMonoSubscriber.onNext(LambdaMonoSubscriber.java:171)
	at reactor.core.publisher.MonoCallable$MonoCallableSubscription.request(MonoCallable.java:156)
	at reactor.core.publisher.LambdaMonoSubscriber.onSubscribe(LambdaMonoSubscriber.java:121)
	at reactor.core.publisher.MonoCallable.subscribe(MonoCallable.java:48)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4576)
	at reactor.core.publisher.Mono.subscribeWith(Mono.java:4642)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4542)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4478)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4450)
	at com.azure.core.http.okhttp.OkHttpAsyncHttpClient.lambda$send$3(OkHttpAsyncHttpClient.java:122)
	at reactor.core.publisher.MonoCreate$DefaultMonoSink.onRequest(MonoCreate.java:225)
	at com.azure.core.http.okhttp.OkHttpAsyncHttpClient.lambda$send$4(OkHttpAsyncHttpClient.java:108)
	at reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:61)
	at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76)
	at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:241)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:204)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:155)
	at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:82)
	at reactor.core.publisher.SerializedSubscriber.onNext(SerializedSubscriber.java:99)
	at reactor.core.publisher.FluxRepeatWhen$RepeatWhenMainSubscriber.onNext(FluxRepeatWhen.java:143)
	at reactor.core.publisher.MonoUsing$MonoUsingSubscriber.onNext(MonoUsing.java:231)
	at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onNext(MonoPeekTerminal.java:180)
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158)
	at reactor.core.publisher.MonoMaterialize$MaterializeSubscriber.drain(MonoMaterialize.java:133)
	at reactor.core.publisher.MonoMaterialize$MaterializeSubscriber.onComplete(MonoMaterialize.java:127)
	at reactor.core.publisher.Operators.complete(Operators.java:137)
	at reactor.core.publisher.MonoEmpty.subscribe(MonoEmpty.java:46)
	at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76)
	at reactor.core.publisher.MonoUsing.subscribe(MonoUsing.java:102)
	at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
	at reactor.core.publisher.MonoFromFluxOperator.subscribe(MonoFromFluxOperator.java:83)
	at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4576)
	at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:265)
	at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:51)
	at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76)
	at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4576)
	at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:205)
	at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
	at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:63)
	at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4576)
	at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:205)
	at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53)
	at reactor.core.publisher.Mono.subscribe(Mono.java:4560)
	at reactor.core.publisher.Mono.block(Mono.java:1778)
	at com.azure.storage.blob.specialized.BlobClientBase.openInputStream(BlobClientBase.java:394)
	at com.azure.storage.blob.specialized.BlobClientBase.openInputStream(BlobClientBase.java:324)
	at io.trino.filesystem.azure.AzureInput.readFully(AzureInput.java:61)
	at io.trino.filesystem.tracing.TracingInput.lambda$readFully$0(TracingInput.java:53)
	at io.trino.filesystem.tracing.Tracing.lambda$withTracing$1(Tracing.java:38)
	at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47)
	at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:37)
	at io.trino.filesystem.tracing.TracingInput.readFully(TracingInput.java:53)
	at io.trino.plugin.hive.parquet.TrinoParquetDataSource.readInternal(TrinoParquetDataSource.java:64)
	at io.trino.parquet.AbstractParquetDataSource.readFully(AbstractParquetDataSource.java:122)
	at io.trino.parquet.AbstractParquetDataSource$ReferenceCountedReader.read(AbstractParquetDataSource.java:332)
	at io.trino.parquet.ChunkReader.readUnchecked(ChunkReader.java:31)
	at io.trino.parquet.reader.ChunkedInputStream.readNextChunk(ChunkedInputStream.java:149)
	at io.trino.parquet.reader.ChunkedInputStream.read(ChunkedInputStream.java:93)
	at shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:170)
	at shaded.parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:100)
	at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:632)
	at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:532)
	at org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:188)
	at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1003)
	at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:995)
	at org.apache.parquet.format.PageHeader.read(PageHeader.java:870)
	at org.apache.parquet.format.Util.read(Util.java:390)
	at org.apache.parquet.format.Util.readPageHeader(Util.java:133)
	at org.apache.parquet.format.Util.readPageHeader(Util.java:128)
	at io.trino.parquet.reader.ParquetColumnChunkIterator.readPageHeader(ParquetColumnChunkIterator.java:116)
	at io.trino.parquet.reader.ParquetColumnChunkIterator.next(ParquetColumnChunkIterator.java:83)
	at io.trino.parquet.reader.ParquetColumnChunkIterator.next(ParquetColumnChunkIterator.java:42)
	at com.google.common.collect.Iterators$PeekingImpl.peek(Iterators.java:1219)
	at io.trino.parquet.reader.PageReader.readDictionaryPage(PageReader.java:160)
	at io.trino.parquet.reader.AbstractColumnReader.setPageReader(AbstractColumnReader.java:79)
	at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:460)
	at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:543)
	at io.trino.parquet.reader.ParquetReader.readStruct(ParquetReader.java:373)
	at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:534)
	at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:526)
	at io.trino.parquet.reader.ParquetReader.lambda$nextPage$3(ParquetReader.java:252)
	at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:72)
	at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:312)
	at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:291)
	at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:186)
	at io.trino.spi.Page.getLoadedPage(Page.java:244)
	at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:271)
	at io.trino.operator.Driver.processInternal(Driver.java:403)
	at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
	at io.trino.operator.Driver.tryWithLock(Driver.java:709)
	at io.trino.operator.Driver.process(Driver.java:298)
	at io.trino.operator.Driver.processForDuration(Driver.java:269)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
	at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:201)
	at io.trino.$gen.Trino_460_220_g1e499ed____20241025_161104_2.run(Unknown Source)
	at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:202)
	at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:172)
	at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:159)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1575)

I'm not able to reproduce the issue 100% of the time but it's nonetheless very easy to reproduce on our end. Querying any catalog using the native Azure FS for a small amount of data (like 1000+ rows) will almost certainly throw the Unbalanced enter/exit exceptions later followed by connection leak warnings. For example, since switching to trino 460 on October 16th in one of our clusters, we've had roughly 430k queries, 157k unbalanced enter/exit exceptions and 107k connection leak warnings in our logs.

I tried investigating the issue on my own but it's not really clear to me what's happening or where the issue is between Trino/Azure SDK/OkHttp. Different versions of Trino and Azure SDK all exhibit the same problem in my experience.

@wendigo
Copy link
Contributor

wendigo commented Nov 13, 2024

@nineinchnick do you want to take a look?

@nineinchnick
Copy link
Member

@cccs-nik you're reporting some issue that's not manifesting itself anymore since 460? Is it worth investigating? It might have been an issue in the Azure SDK, and we're constantly upgrading it to the latest version.

@cccs-nik
Copy link
Member Author

@nineinchnick Sorry for the confusion, what I meant was that we are not crashing as a result of using the native Azure FS anymore in 460, but the Unbalanced enter/exits and connection leak warnings are most definitely still happening.

@findinpath
Copy link
Contributor

Cc @anusudarsan

@alaturqua
Copy link
Contributor

Similar error is happening on version 467 as well:

trino-worker 2025-01-22T15:35:47.007Z    ERROR    SplitRunner-20250122_153542_00973_6i3fj.2.8.0-331-9646    reactor.core.publisher.Operators    Operator called default onErrorDropped
trino-worker java.lang.IllegalStateException: Unbalanced enter/exit
trino-worker     at okio.AsyncTimeout.enter(AsyncTimeout.kt:58)
trino-worker     at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:384)
trino-worker     at okio.RealBufferedSource.read(RealBufferedSource.kt:194)
stream closed EOF for <namespace>/trino-worker-868f4cbc44-d4p58 (ssd-cleaner)
trino-worker     at okhttp3.internal.http1.Http1ExchangeCodec$AbstractSource.read(Http1ExchangeCodec.kt:339)
trino-worker     at okhttp3.internal.http1.Http1ExchangeCodec$FixedLengthSource.read(Http1ExchangeCodec.kt:376)
trino-worker     at okhttp3.internal.Util.skipAll(Util.kt:344)
trino-worker     at okhttp3.internal.Util.discard(Util.kt:365)
trino-worker     at okhttp3.internal.http1.Http1ExchangeCodec$FixedLengthSource.close(Http1ExchangeCodec.kt:395)
trino-worker     at okio.ForwardingSource.close(ForwardingSource.kt:32)
trino-worker     at okhttp3.internal.connection.Exchange$ResponseBodySource.close(Exchange.kt:314)
trino-worker     at okio.RealBufferedSource.close(RealBufferedSource.kt:486)
trino-worker     at okhttp3.internal.Util.closeQuietly(Util.kt:495)
trino-worker     at okhttp3.ResponseBody.close(ResponseBody.kt:192)
trino-worker     at com.azure.core.http.okhttp.implementation.OkHttpAsyncResponse.close(OkHttpAsyncResponse.java:119)
trino-worker     at com.azure.core.http.okhttp.implementation.OkHttpAsyncResponse.lambda$getBody$1(OkHttpAsyncResponse.java:61)
trino-worker     at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.cleanup(FluxUsing.java:334)
trino-worker     at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.cancel(FluxUsing.java:328)
trino-worker     at reactor.core.publisher.FluxUsing$UsingFuseableSubscriber.cancel(FluxUsing.java:326)
trino-worker     at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.cancel(FluxMapFuseable.java:176)
trino-worker     at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drainLoop(Operators.java:2425)
trino-worker     at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.drain(Operators.java:2393)
trino-worker     at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.cancel(Operators.java:2205)
trino-worker     at reactor.core.publisher.Operators$BaseFluxToMonoOperator.cancel(Operators.java:2086)
trino-worker     at reactor.core.publisher.FluxDefaultIfEmpty$DefaultIfEmptySubscriber.cancel(FluxDefaultIfEmpty.java:103)
trino-worker     at reactor.core.publisher.Operators$BaseFluxToMonoOperator.cancel(Operators.java:2086)
trino-worker     at reactor.core.publisher.MonoCollect$CollectSubscriber.cancel(MonoCollect.java:150)
trino-worker     at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.cancel(FluxMapFuseable.java:176)
trino-worker     at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.cancel(FluxMapFuseable.java:176)
trino-worker     at reactor.core.publisher.Operators.terminate(Operators.java:1328)
trino-worker     at reactor.core.publisher.MonoZip$ZipInner.cancel(MonoZip.java:548)
trino-worker     at reactor.core.publisher.MonoZip$ZipCoordinator.cancel(MonoZip.java:313)
trino-worker     at reactor.core.publisher.MonoFlatMap$FlatMapMain.cancel(MonoFlatMap.java:207)
trino-worker     at reactor.core.publisher.MonoFlatMap$FlatMapMain.cancel(MonoFlatMap.java:199)
trino-worker     at reactor.core.publisher.BlockingSingleSubscriber.dispose(BlockingSingleSubscriber.java:74)
trino-worker     at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:94)
trino-worker     at reactor.core.publisher.Mono.block(Mono.java:1779)
trino-worker     at com.azure.storage.blob.specialized.BlobClientBase.openInputStream(BlobClientBase.java:564)
trino-worker     at com.azure.storage.blob.specialized.BlobClientBase.openInputStream(BlobClientBase.java:488)
trino-worker     at io.trino.filesystem.azure.AzureInput.readFully(AzureInput.java:61)
trino-worker     at io.trino.filesystem.tracing.TracingInput.lambda$readFully$0(TracingInput.java:53)
trino-worker     at io.trino.filesystem.tracing.Tracing.lambda$withTracing$1(Tracing.java:38)
trino-worker     at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:47)
trino-worker     at io.trino.filesystem.tracing.Tracing.withTracing(Tracing.java:37)
trino-worker     at io.trino.filesystem.tracing.TracingInput.readFully(TracingInput.java:53)
trino-worker     at io.trino.filesystem.alluxio.AlluxioInput.doExternalRead(AlluxioInput.java:85)
trino-worker     at io.trino.filesystem.alluxio.AlluxioInput.readFully(AlluxioInput.java:73)
trino-worker     at io.trino.plugin.hive.parquet.TrinoParquetDataSource.readInternal(TrinoParquetDataSource.java:64)
trino-worker     at io.trino.parquet.AbstractParquetDataSource.readFully(AbstractParquetDataSource.java:122)
trino-worker     at io.trino.parquet.AbstractParquetDataSource$ReferenceCountedReader.read(AbstractParquetDataSource.java:332)
trino-worker     at io.trino.parquet.AbstractParquetDataSource$1.read(AbstractParquetDataSource.java:226)
trino-worker     at io.trino.parquet.ChunkReader.readUnchecked(ChunkReader.java:31)
trino-worker     at io.trino.parquet.reader.ChunkedInputStream.readNextChunk(ChunkedInputStream.java:149)
trino-worker     at io.trino.parquet.reader.ChunkedInputStream.read(ChunkedInputStream.java:93)
trino-worker     at shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:170)
trino-worker     at shaded.parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:100)
trino-worker     at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:632)
trino-worker     at shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:532)
trino-worker     at org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:188)
trino-worker     at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:1003)
trino-worker     at org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:995)
trino-worker     at org.apache.parquet.format.PageHeader.read(PageHeader.java:870)
trino-worker     at org.apache.parquet.format.Util.read(Util.java:390)
trino-worker     at org.apache.parquet.format.Util.readPageHeader(Util.java:133)
trino-worker     at org.apache.parquet.format.Util.readPageHeader(Util.java:128)
trino-worker     at io.trino.parquet.reader.ParquetColumnChunkIterator.readPageHeader(ParquetColumnChunkIterator.java:116)
trino-worker     at io.trino.parquet.reader.ParquetColumnChunkIterator.next(ParquetColumnChunkIterator.java:83)
trino-worker     at io.trino.parquet.reader.ParquetColumnChunkIterator.next(ParquetColumnChunkIterator.java:42)
trino-worker     at com.google.common.collect.Iterators$PeekingImpl.peek(Iterators.java:1219)
trino-worker     at io.trino.parquet.reader.PageReader.readDictionaryPage(PageReader.java:160)
trino-worker     at io.trino.parquet.reader.AbstractColumnReader.setPageReader(AbstractColumnReader.java:79)
trino-worker     at io.trino.parquet.reader.ParquetReader.readPrimitive(ParquetReader.java:465)
trino-worker     at io.trino.parquet.reader.ParquetReader.readColumnChunk(ParquetReader.java:548)
trino-worker     at io.trino.parquet.reader.ParquetReader.readBlock(ParquetReader.java:531)
trino-worker     at io.trino.parquet.reader.ParquetReader.lambda$nextPage$3(ParquetReader.java:252)
trino-worker     at io.trino.parquet.reader.ParquetBlockFactory$ParquetBlockLoader.load(ParquetBlockFactory.java:72)
trino-worker     at io.trino.spi.block.LazyBlock$LazyData.load(LazyBlock.java:312)
trino-worker     at io.trino.spi.block.LazyBlock$LazyData.getFullyLoadedBlock(LazyBlock.java:291)
trino-worker     at io.trino.spi.block.LazyBlock.getLoadedBlock(LazyBlock.java:186)
trino-worker     at io.trino.spi.Page.getLoadedPage(Page.java:259)
trino-worker     at io.trino.operator.project.InputChannels.getInputChannels(InputChannels.java:60)
trino-worker     at io.trino.sql.gen.columnar.PageFilterEvaluator.evaluate(PageFilterEvaluator.java:40)
trino-worker     at io.trino.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:124)
trino-worker     at io.trino.operator.ScanFilterAndProjectOperator$SplitToPages.lambda$processPageSource$1(ScanFilterAndProjectOperator.java:286)
trino-worker     at io.trino.operator.WorkProcessorUtils.lambda$flatMap$4(WorkProcessorUtils.java:285)
trino-worker     at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:359)
trino-worker     at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
trino-worker     at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
trino-worker     at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
trino-worker     at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
trino-worker     at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
trino-worker     at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
trino-worker     at io.trino.operator.WorkProcessorUtils$BlockingProcess.process(WorkProcessorUtils.java:207)
trino-worker     at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
trino-worker     at io.trino.operator.WorkProcessorUtils.lambda$flatten$6(WorkProcessorUtils.java:317)
trino-worker     at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:359)
trino-worker     at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
trino-worker     at io.trino.operator.WorkProcessorUtils$3.process(WorkProcessorUtils.java:346)
trino-worker     at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
trino-worker     at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
trino-worker     at io.trino.operator.WorkProcessorUtils.lambda$processStateMonitor$2(WorkProcessorUtils.java:240)
trino-worker     at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
trino-worker     at io.trino.operator.WorkProcessorUtils.getNextState(WorkProcessorUtils.java:261)
trino-worker     at io.trino.operator.WorkProcessorUtils.lambda$finishWhen$3(WorkProcessorUtils.java:255)
trino-worker     at io.trino.operator.WorkProcessorUtils$ProcessWorkProcessor.process(WorkProcessorUtils.java:423)
trino-worker     at io.trino.operator.WorkProcessorSourceOperatorAdapter.getOutput(WorkProcessorSourceOperatorAdapter.java:133)
trino-worker     at io.trino.operator.Driver.processInternal(Driver.java:403)
trino-worker     at io.trino.operator.Driver.lambda$process$8(Driver.java:306)
trino-worker     at io.trino.operator.Driver.tryWithLock(Driver.java:709)
trino-worker     at io.trino.operator.Driver.process(Driver.java:298)
trino-worker     at io.trino.operator.Driver.processForDuration(Driver.java:269)
trino-worker     at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:890)
trino-worker     at io.trino.execution.executor.dedicated.SplitProcessor.run(SplitProcessor.java:77)
trino-worker     at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.lambda$run$0(TaskEntry.java:201)
trino-worker     at io.trino.$gen.Trino_467____20250122_152022_2.run(Unknown Source)
trino-worker     at io.trino.execution.executor.dedicated.TaskEntry$VersionEmbedderBridge.run(TaskEntry.java:202)
trino-worker     at io.trino.execution.executor.scheduler.FairScheduler.runTask(FairScheduler.java:177)
trino-worker     at io.trino.execution.executor.scheduler.FairScheduler.lambda$submit$0(FairScheduler.java:164)
trino-worker     at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
trino-worker     at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
trino-worker     at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
trino-worker     at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
trino-worker     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
trino-worker     at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
trino-worker     at java.base/java.lang.Thread.run(Thread.java:1575)
trino-worker 
trino-worker 
trino-worker 2025-01-22T15:35:47.016Z    INFO    OkHttp https://<anonymized>.blob.core.windows.net/...    com.azure.core.implementation.RetriableDownloadFlux    Using retry attempt 1 of 5 while downloading.

@gustavoatt
Copy link

we are also seeing this error here. Wondering if you have a workaround or just reverting back to the legacy Azure filesystem?

For reference we are on version 463 and are still seeing the same issue

@wendigo
Copy link
Contributor

wendigo commented Jan 22, 2025

Does it make cluster unstable or is it just annoying that these are logged?

@gustavoatt
Copy link

currently it makes the cluster unstable, around 40% of simple queries like the one below fail.

select col_a, col_b
from table
limit 5

I'm trying to switch back to the legacy filesystem to see if we have more luck. For context this is using the Delta Lake connector and our config for the azure file system is:

fs.native-azure.enabled=true
azure.auth-type=DEFAULT
azure.max-http-requests=100

@wendigo
Copy link
Contributor

wendigo commented Jan 23, 2025

@gustavoatt can you test Trino build with following change: #24773?

@gustavoatt
Copy link

@wendigo that does seem to have fixed the issue in one query that I could consistently get to fail ~40-50% of the time. Appreciate you looking at this!

@wendigo
Copy link
Contributor

wendigo commented Jan 23, 2025

@gustavoatt Yeah I was worried that okhttp is to blame

1 similar comment
@wendigo
Copy link
Contributor

wendigo commented Jan 23, 2025

@gustavoatt Yeah I was worried that okhttp is to blame

@gustavoatt
Copy link

yeah, if okhttp was not maintained then it makes sense to switch to netty. I will deploy the change to one of our clusters and reach out if I see any issues but I don't expect it to. Specially since the native filesystem seems to be slightly faster than the Hadoop one at least for the queries that I was testing on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants