-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection leaks on native Azure filesystem #24116
Comments
@nineinchnick do you want to take a look? |
@cccs-nik you're reporting some issue that's not manifesting itself anymore since 460? Is it worth investigating? It might have been an issue in the Azure SDK, and we're constantly upgrading it to the latest version. |
@nineinchnick Sorry for the confusion, what I meant was that we are not crashing as a result of using the native Azure FS anymore in 460, but the Unbalanced enter/exits and connection leak warnings are most definitely still happening. |
Cc @anusudarsan |
Similar error is happening on version 467 as well:
|
we are also seeing this error here. Wondering if you have a workaround or just reverting back to the legacy Azure filesystem? For reference we are on version 463 and are still seeing the same issue |
Does it make cluster unstable or is it just annoying that these are logged? |
currently it makes the cluster unstable, around 40% of simple queries like the one below fail. select col_a, col_b
from table
limit 5 I'm trying to switch back to the legacy filesystem to see if we have more luck. For context this is using the Delta Lake connector and our config for the azure file system is:
|
@gustavoatt can you test Trino build with following change: #24773? |
@wendigo that does seem to have fixed the issue in one query that I could consistently get to fail ~40-50% of the time. Appreciate you looking at this! |
@gustavoatt Yeah I was worried that okhttp is to blame |
1 similar comment
@gustavoatt Yeah I was worried that okhttp is to blame |
yeah, if okhttp was not maintained then it makes sense to switch to netty. I will deploy the change to one of our clusters and reach out if I see any issues but I don't expect it to. Specially since the native filesystem seems to be slightly faster than the Hadoop one at least for the queries that I was testing on |
Since switching to the new Azure native FS in our environments we've been seeing a lot of connection leaks and java.lang.IllegalStateException: Unbalanced enter/exit exceptions. The native FS seemed to cause stability issues in earlier versions of Trino (like in Trino 452) so we had switched off back to legacy but in version 460 where it's required it seems ok. I'm not sure if our past stability issues in other versions were unrelated to the leaks or they're now mitigated due to other changes.
We get the following exceptions when running most queries:
Followed by connection leak warnings:
And I believe we get the following stack traces as a result of turning logging up:
I'm not able to reproduce the issue 100% of the time but it's nonetheless very easy to reproduce on our end. Querying any catalog using the native Azure FS for a small amount of data (like 1000+ rows) will almost certainly throw the Unbalanced enter/exit exceptions later followed by connection leak warnings. For example, since switching to trino 460 on October 16th in one of our clusters, we've had roughly 430k queries, 157k unbalanced enter/exit exceptions and 107k connection leak warnings in our logs.
I tried investigating the issue on my own but it's not really clear to me what's happening or where the issue is between Trino/Azure SDK/OkHttp. Different versions of Trino and Azure SDK all exhibit the same problem in my experience.
The text was updated successfully, but these errors were encountered: