Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1892865: Starting an query in AsyncMode and then evicting the connection pool cancels the async query #1077

Open
john-skyward-solutions opened this issue Dec 31, 2024 · 12 comments
Assignees
Labels
question Issue is a usage/other question rather than a bug status-information_needed Additional information is required from the reporter status-triage_done Initial triage done, will be further handled by the driver team

Comments

@john-skyward-solutions
Copy link

We've found that starting an async query and then evicting the connection leads to session, and the queries still running on that session, to be cancelled.

I understand why that makes sense in a synchronous world, but in an async world where someone has deliberately started a query in async mode to leave it running it makes little sense.

Is there a way we can either force a new session to begin on the existing open connection after the async mode is started OR evict the connection from the pool in such a way that it does not stop async queries and it's transactions it has open?

Please answer these questions before submitting your issue.
In order to accurately debug the issue this information is required. Thanks!

  1. What version of .NET driver are you using?
    4.2.0

  2. What operating system and processor architecture are you using?
    All of Windows x64 / macOS silicon / Linux x64

  3. What version of .NET framework are you using?
    dotnet 8.0

  4. What did you do?

`using var conn = await new SnowflakeDbConnection("");
conn.Open();
conn.PreventPooling();

var cmd = conn.CreateSnowflakeDbCommand();
cmd.CommandText = "<async query that takes over 5 minutes>";
var queryId = await cmd.ExecuteAsyncInAsyncMode(default);

Console.WriteLine("QueryID started in async mode: {0}", queryId);`

5 minutes after running the above the query will be cancelled.

  1. What did you expect to see?
    The query should continue to execute until completion.
@john-skyward-solutions john-skyward-solutions changed the title Starting an query in AsyncMode and then evicting the connection pool cancels the async query Starting an query in AsyncMode and then preventing pooling on the connection cancels the async query Dec 31, 2024
@sfc-gh-dszmolka sfc-gh-dszmolka self-assigned this Jan 2, 2025
@sfc-gh-dszmolka sfc-gh-dszmolka added question Issue is a usage/other question rather than a bug and removed bug labels Jan 2, 2025
@sfc-gh-dszmolka
Copy link
Contributor

hi - Happy New Year :)

The behaviour you described -at the first superficial glance- does not seem to be related to this library, but the Snowflake backend itself. Especially the 5 minutes you mentioned is suspicious.

Please see the documentation for ABORT_DETACHED_QUERY - isn't this what we're seeing here? Can you please verify if the parameter is set to FALSE in case in these queries? If it's TRUE then it's expected to see the behaviour you're reporting, regardless of programming language or client driver type.

@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage Issue is under initial triage status-information_needed Additional information is required from the reporter labels Jan 2, 2025
@john-skyward-solutions
Copy link
Author

john-skyward-solutions commented Jan 13, 2025

Yes we set ABORT_DETACHED_QUERY to FALSE before all Async Queries because we run into the cancelled issue for longer running queries. This is why I was so susprised to see this behaviour when preventing the connection to be pooled, because we already set it to false.

The below screenshot is every query in a single session, where we set abort detached query to false and then run our async query after that. You'll see that it still fails.

image

When a connection is removed from the pool, is the session aborted or similar?

@sfc-gh-dszmolka
Copy link
Contributor

yes connection Close() (and Dispose() too) actively closed the underlying Snowflake Session when it's called.

it would be interesting to see the queryId of one of the failed queries but i'm almost sure the Snowflake engine (so not this library in which the issue was filed) aborts it due to being disconnected. Can you share at least one of them ?

speaking about the libraries; it would be interestin to run the same query with another Snowflake driver which supports async mode (JDBC driver, Python driver, now Node.js driver too i believe) and compare the behaviour. I would assume it's the same.

@john-skyward-solutions
Copy link
Author

@sfc-gh-dszmolka I'm afraid I can't, but it's very easy to replicate on yourside by following the same approach.

  // create connection
  conn.Open();
  conn.PreventPooling();

  var cmd = conn.CreateCommand();

  cmd.CommandText = "alter session set abort_detached_query = false;";
  await cmd.ExecuteNonQueryAsync();

  cmd.CommandText = "<long running sql query text>";
  var queryId = await cmd.ExecuteAsyncInAsyncMode(default);

  Console.WriteLine("QueryID started in async mode: {0}", queryId);

Then watch that query to see it stop after 5 minutes.

I'd wondered the same on the other libraries. I wasn't sure if they didn't have connection pooling, so the problem did not exist, but it looks like they do.

I suppose another wider concern is that if an application shuts down fully, and any connections in the pool are closed, that would stop any async queries from completing?

@sfc-gh-dszmolka
Copy link
Contributor

thank you for this example. also after all i was able to find the session and all queries run within it for the above attempt. Snowflake Cloud Services logs clearly indicate that the query was aborted on explicit abort request from the user, and not some timeout or such.

sp based on the code you shared and our documentation on how to run async queries i attempted the following:

using Snowflake.Data.Client;

const string connectionString = "account=myaccount.west-europe.azure;user=myuser;password=mypassword;insecuremode=true";
await using var conn = new SnowflakeDbConnection(connectionString);

var sleepAmount = 420;
var sleepTime = TimeSpan.FromSeconds(sleepAmount);

conn.Open();
conn.PreventPooling();
SnowflakeDbCommand cmd = (SnowflakeDbCommand)conn.CreateCommand();

cmd.CommandText = "alter session set abort_detached_query = false;";
await cmd.ExecuteNonQueryAsync();

string thisWontFinishInFive = @"SELECT * FROM
 SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.CUSTOMER
,SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.LINEITEM
,SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.ORDERS
";

cmd.CommandText = thisWontFinishInFive;
var queryId = await cmd.ExecuteAsyncInAsyncMode(CancellationToken.None).ConfigureAwait(false);

Console.WriteLine("QueryID started in async mode: {0}", queryId);
Console.WriteLine("Now sleeping for {0} seconds.", sleepAmount);
Thread.Sleep(sleepTime);
Console.WriteLine("Waking up and checking on status of query {0}", queryId);
var queryStatus = await cmd.GetQueryStatusAsync(queryId, CancellationToken.None).ConfigureAwait(false);
var isQueryRunning = conn.IsStillRunning(queryStatus);
Console.WriteLine("Is query still running? {0}", isQueryRunning);

result:

QueryID started in async mode: 01b9c0ad-0305-5817-0003-351200031022
Now sleeping for 420 seconds.
Waking up and checking on status of query 01b9c0ad-0305-5817-0003-351200031022
Is query still running? True

when checking Query History on the GUI, the query was indeed still in Running even after 7 minutes. I had to manually SYSTEM$CANCEL_QUERY it.

I believe the biggest difference might be how you are using the cancellation token in ExecuteAsyncInAsyncMode; can you please try something shared above and see if it helps keeping async queries alive past the 5m mark when ABORT_DETACHED_QUERY=false?

@sfc-gh-dszmolka sfc-gh-dszmolka added status-triage_done Initial triage done, will be further handled by the driver team and removed status-triage Issue is under initial triage status-information_needed Additional information is required from the reporter labels Jan 16, 2025
@john-skyward-solutions
Copy link
Author

Thanks for your quick reply @sfc-gh-dszmolka, I have adjusted my call to use the syntax you mention but it has has continued to fail. Although that does not surprise me. The cancellation token I use, default, is the same as CancellationToken.None. ConfigureAwait is used by dotnet to decide whether it should remember the context before and after the thread being started and completed, which in this case would make no difference as to what we are doing.

I don't think our examples are comparative because you have left your connection open that entire time and thus do not allow it to be closed. You would need to manually dispose the connection or move the starting of the query into a separate method so that it can be disposed automatically by the using statement. At that point the connection will be evicted from the pool and I believe trigger the clean up of the session, which after 5 minutes should cancel the query. With the discarding of the connection having been started, you can then sleep the thread and open a new connection to check if it's still running or not.

@sfc-gh-dszmolka
Copy link
Contributor

If that's not too big of a problem, could you please provide a full, runnable reproduction ? Although I still think if someone deliberately closes the session, I'm not sure how the Snowflake engine could keep the query running without having a Session.

But I'd like to confirm the behaviour and a reproduction is highly appreciated.

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-information_needed Additional information is required from the reporter label Jan 20, 2025
@john-skyward-solutions
Copy link
Author

john-skyward-solutions commented Jan 20, 2025

@sfc-gh-dszmolka Yea I'll get one for you in the next few days.

Separately to that, I'm interested in the wider consequences. In my mind, if I am starting an async query then it is because I don't want the application to wait for it to finish executing. Yet, if the session must be kept alive for that async query to complete then there is no gauranteed that the async query will complete?

The connection pool itself is a good example of what could happen. Let's say a user starts an async query on one of the ten active connections in the connection pool, but then shortly after the connection pool manager decides there are not enough active connections so starts to close some of them. The connection that started the async query might be closed and the async query ended before it is complete. I, as the developer, have not done anything explicit to make this happen but it has all the same. Not only that but I am confused why it would happen and would have to open support tickets and dig into the issue.

The only way I can see to ensure that the async query does complete would be to "forget" the connection, but do not close it. Which I do not think there is any way of doing that.

@sfc-gh-dszmolka
Copy link
Contributor

What I'm 90% sure is the intended behaviour is, if you deliberately close the Session from user end, then Snowflake should interpret it as everything should cease to exist in that session. This should be driver independent.

What I'm entirely not sure about is, however, this behaviour which you describe. Using the pool as intended, then it seemingly behaving after a while as if you were deliberately closing down the session. That's what your reproduction could help with greatly. I'd like to look into it and likely need to involve the dev team as well, and for that the reproduction could be immensely helpful (also for getting to a bugfix, if it turns out the behaviour is not intended).
Thank you in advance!

@github-actions github-actions bot changed the title Starting an query in AsyncMode and then preventing pooling on the connection cancels the async query SNOW-1892865: Starting an query in AsyncMode and then evicting the connection pool cancels the async query Jan 23, 2025
@john-skyward-solutions
Copy link
Author

I've got delayed in getting back to this, but it is very much on my radar. Hopefully this week I can create a reproducable piece of code.

@john-skyward-solutions
Copy link
Author

@sfc-gh-dszmolka I've worked it out. It's the changed session behaviour, we are creating and swapping some schemas which then makes the connection dirty and it gets destroyed.

https://github.com/snowflakedb/snowflake-connector-net/blob/master/doc/ConnectionPooling.md#changed-session-behavior

What's frustrating, from our point of view, is that there is no way that we can start these queries and ensure that a) other queries do not use the same session (because it is dirty) AND b) keep the async queries running.

Could there be a third option for the changed session behaviour which leaves the session open, stored elsewhere and not closed, until all queries are completed for it?

@john-skyward-solutions
Copy link
Author

Is there a way we can access logs of what connections are destroyed and the reason? That could help debug such things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Issue is a usage/other question rather than a bug status-information_needed Additional information is required from the reporter status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

2 participants