Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ Make Process.run async #272

Merged
merged 4 commits into from
Dec 10, 2024
Merged

♻️ Make Process.run async #272

merged 4 commits into from
Dec 10, 2024

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Jul 6, 2023

As discussed with @giovannipizzi, to allow for introducing async methods to the AiiDA Transport classes (particularly for uploading and downloading files)

The downstream implementation and explanation in aiida-core is here: aiidateam/aiida-core#6079

(Also makes me think, can you also allow for optional async Workflow steps this way)

@codecov
Copy link

codecov bot commented Jul 6, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (49a7117) 90.82% compared to head (0b30f35) 90.82%.

❗ Current head 0b30f35 differs from pull request most recent head ff1b699. Consider uploading reports for the commit ff1b699 to get more accurate results

Additional details and impacted files
@@                Coverage Diff                 @@
##           support/0.21.x     #272      +/-   ##
==================================================
- Coverage           90.82%   90.82%   -0.00%     
==================================================
  Files                  21       21              
  Lines                2973     2972       -1     
==================================================
- Hits                 2700     2699       -1     
  Misses                273      273              
Impacted Files Coverage Δ
src/plumpy/futures.py 74.47% <100.00%> (ø)
src/plumpy/process_states.py 88.26% <100.00%> (ø)
src/plumpy/processes.py 92.46% <100.00%> (ø)
src/plumpy/utils.py 81.38% <100.00%> (-0.12%) ⬇️
src/plumpy/workchains.py 94.35% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@chrisjsewell chrisjsewell changed the base branch from master to support/0.21.x July 7, 2023 17:07
@chrisjsewell chrisjsewell force-pushed the async-run branch 3 times, most recently from 0b30f35 to ff1b699 Compare July 7, 2023 17:12
@@ -225,7 +227,7 @@ async def execute(self) -> State: # type: ignore # pylint: disable=invalid-over
try:
try:
self._running = True
result = self.run_fn(*self.args, **self.kwargs)
result = await self.run_fn(*self.args, **self.kwargs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is mentioned in the blog post that @chrisjsewell mentioned in https://github.com/aiidateam/plumpy/pull/272/files#r1257558025 with await_me_maybe would be the better design IMO. I think but we already do it the same way in the Process class so this is at least consistent. Maybe we can instead improve the doc a bit on the places where we use ensure_coroutine. Something like

We wrap `run_fn` to a coroutine so we can apply await on it, even it if it was not a coroutine in the first place. This allows the same usage of async and non-async function with the await syntax while not changing the program logic.

At least this is my understanding why we do this. (Also would add something like this to _run_task in class Process)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agoscinski , I've added your comment.

@sphuber
Copy link
Collaborator

sphuber commented Jul 13, 2023

Thanks @chrisjsewell . If we can test this in an integrated way with aiida-core to show that it will speed up the daemon throughput when under heavy io load, this would be great. One open question is that this does break the interface, so I guess technically we should release this with 0.22, but that is already out since a while, but is not being used yet in aiida-core. There has been an open PR for it for a long time (aiidateam/aiida-core#5732) but there is a small problem with the tests that I haven't been able to fix. Running normally seems ok, but have not dared merge it yet. We probably need to set aside some time with the team to test it thoroughly so we can merge it.

Then again, given that 0.22 has already been released, if we are going to bother with proper semver, I guess we should really not merge it in there either and release it with 0.23 instead 🤔

@chrisjsewell
Copy link
Member Author

Yeh so I rebased it onto v0.21, just so I could get aiidateam/aiida-core#6079 to work (without PR5732)

Here I mention that I haven't actually implemented any (new) asyncio behaviour for the local/ssh transports, as this may take some more thinking, for a "production ready" to ensure it doesn't create any new problems (e.g. limiting how many file transfers can be running at any one time)

I'm sure for ssh (or Firecrest) it should not be too difficult to show a toy example of a speed up. Obviously it is very dépendant on the type of calculations you are running though, I guess most suited if you are uploading/downloading many 1000s of files and/or very large files (and there is also different async strategies for both)

@chrisjsewell
Copy link
Member Author

For sure getting PR5732 through would be nice 👍

@khsrali khsrali changed the base branch from support/0.21.x to master December 5, 2024 13:35
@khsrali khsrali requested review from agoscinski and unkcpz December 5, 2024 14:04
@@ -225,7 +227,7 @@ async def execute(self) -> State: # type: ignore # pylint: disable=invalid-over
try:
try:
self._running = True
result = self.run_fn(*self.args, **self.kwargs)
result = await self.run_fn(*self.args, **self.kwargs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is mentioned in the blog post that @chrisjsewell mentioned in https://github.com/aiidateam/plumpy/pull/272/files#r1257558025 with await_me_maybe would be the better design IMO. I think but we already do it the same way in the Process class so this is at least consistent. Maybe we can instead improve the doc a bit on the places where we use ensure_coroutine. Something like

We wrap `run_fn` to a coroutine so we can apply await on it, even it if it was not a coroutine in the first place. This allows the same usage of async and non-async function with the await syntax while not changing the program logic.

At least this is my understanding why we do this. (Also would add something like this to _run_task in class Process)

@agoscinski
Copy link

I think it is okay, just a bit doc would be nice

super().__init__(process)
assert run_fn is not None
self.run_fn = run_fn
self.run_fn = ensure_coroutine(run_fn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite worried about this. I think this run_fn will become the continue_fn when it is recover from the Waiting state, which means all such xx_fn should be coroutines along the way. I need to take a close look to see how this change will make things different. Will do it next week.

Copy link

@agoscinski agoscinski Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisjsewell suggested to do await_me_maybe https://github.com/aiidateam/plumpy/pull/272/files#r1257558025 that would avoid this but I did not read about any technical reason do it. The blog post he links is only arguing for it cause of cleanness of the code. I assumed wrapping a blocking function is like writing async to the function, it then is just executed like a blocking function when used with await. So

def two():
    # blocking function
    time.sleep(1)
    print("Two")

async def blocking():
    print("One")
    two()
    print("Three")

async def also_blocking():
    # does the same as blocking
    coro = ensure_coroutine(two)
    print("One")
    await coro()
    print("Three")

async def not_blocking():
    coro = ensure_coroutine(two)
    print("One")
    coro() # Runtime warning 
    print("Three")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed wrapping a blocking function is like writing async to the function, it then is just executed like a blocking function when used with await.

That's true, the blog post mentions the "maybe await" pattern is mostly for async framework that can support the downstream app can write block function. If the operation is block function, then it is run in block manner.

I think this run_fn will become the continue_fn when it is recover from the Waiting state, which means all such xx_fn should be coroutines along the way.

From aiida-core point of view, this never happened, since the continue_fn is never set in the aiida-core Waiting class. The def run is used to create the initial Created state and used to transfer the aiida Process
into its own Waiting state(s).

Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal of this PR is for aiidateam/aiida-core#6626 to support async SSH operations. I think it should not be the 'running' state but the 'waiting' state run method should be async, no? I am not sure, will take a close look with the thing I mentioned in the comment above.

@unkcpz
Copy link
Member

unkcpz commented Dec 9, 2024

I take a closer look at the implementation of related part. My opinion is, aiidateam/aiida-core#6626 do not need this PR to make async SSH works.
The reason to make Process.run async is that somewhere downstream implementation of run required to be async, in aiidateam/aiida-core#6626 that is run method of CalcJob class. The change in aiidateam/aiida-core#6626 is making _perform_dry_run and _perfrom_import both become async function.
But these two operations are apply to local transport which use system file IO and are not async. I'd suggest for aiidateam/aiida-core#6626 only focus on remote file operation using Async SSH and revert the run method of CalcJob to sync function and there is no need for this PR to be depended.

I am wrong, the LocalTransport do not need to be async but it calls upload_calculation which require the async syntax.

@unkcpz
Copy link
Member

unkcpz commented Dec 9, 2024

I think it is good to go, as mentioned by @agoscinski, might be better to add some doc here and there. This is meanwhile is a backward incompatible change for aiida-core (the only downstream app we know I assume?). So this should go with a minor release.

@sphuber
Copy link
Collaborator

sphuber commented Dec 10, 2024

Please don't merge in this state though. There is something severly wrong with the branch as it contains many commits that are already on the master branch. Please fix the branch first

@khsrali
Copy link
Contributor

khsrali commented Dec 10, 2024

If we can test this in an integrated way with aiida-core to show that it will speed up the daemon throughput when under heavy if load, this would be great.

@sphuber, I made some tests, overall looks good.
See the front comment on aiidateam/aiida-core#6626

@sphuber sphuber dismissed agoscinski’s stale review December 10, 2024 21:06

comment addressed

@sphuber sphuber merged commit 4611154 into master Dec 10, 2024
13 checks passed
@sphuber sphuber deleted the async-run branch December 10, 2024 21:07
@sphuber
Copy link
Collaborator

sphuber commented Dec 10, 2024

Thanks everyone

unkcpz pushed a commit to unkcpz/plumpy that referenced this pull request Dec 14, 2024
khsrali added a commit that referenced this pull request Dec 19, 2024
khsrali pushed a commit that referenced this pull request Dec 19, 2024
Co-authored-by: Ali <[email protected]>
(cherry picked from commit 4611154)
khsrali pushed a commit that referenced this pull request Dec 19, 2024
Co-authored-by: Ali <[email protected]>
(cherry picked from commit 4611154)
khsrali pushed a commit that referenced this pull request Dec 19, 2024
Co-authored-by: Ali <[email protected]>
(cherry picked from commit 4611154)
khsrali pushed a commit that referenced this pull request Dec 19, 2024
Co-authored-by: Ali <[email protected]>
(cherry picked from commit 4611154)
khsrali pushed a commit that referenced this pull request Dec 19, 2024
Co-authored-by: Ali <[email protected]>
(cherry picked from commit 4611154)Co-authored-by: Ali <[email protected]>
khsrali added a commit that referenced this pull request Dec 19, 2024
Co-authored-by: Ali <[email protected]>
(cherry picked from commit 4611154)

Co-authored-by: Ali <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants