Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace tornado dependency with built-in module asyncio #4317

Merged
merged 8 commits into from
Nov 27, 2020

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Aug 22, 2020

Fixes #3184

PR #4218 is rebased here for easy to review.

@unkcpz unkcpz added topic/engine pr/ready-for-review PR is ready to be reviewed labels Aug 22, 2020
@unkcpz unkcpz requested a review from sphuber August 22, 2020 06:20
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @unkcpz . It is looking very good and only have a few requests for some minor changes, but mostly some questions first.

Finally, the docs are failing because of the following 5 warnings that we should address before we can merge:

/home/circleci/.local/lib/python3.7/site-packages/aiida/engine/utils.py:docstring of aiida.engine.InterruptableFuture:1: WARNING: py:class reference target not found: _asyncio.Future
/home/circleci/.local/lib/python3.7/site-packages/aiida/engine/runners.py:docstring of aiida.engine.runners.Runner.__init__:: WARNING: py:class reference target not found: the asyncio event loop
/home/circleci/.local/lib/python3.7/site-packages/aiida/engine/transports.py:docstring of aiida.engine.transports.TransportQueue.__init__:: WARNING: py:class reference target not found: The asyncio event loop
/home/circleci/.local/lib/python3.7/site-packages/aiida/engine/utils.py:docstring of aiida.engine.utils.InterruptableFuture:1: WARNING: py:class reference target not found: _asyncio.Future
/home/circleci/.local/lib/python3.7/site-packages/aiida/engine/processes/futures.py:docstring of aiida.engine.processes.futures.ProcessFuture:1: WARNING: py:class reference target not found: _asyncio.Future

Also, the build on jenkins is failing. It seems the daemon is failing to start. Do you think this is a one-time fluke or there really is a problem? Seems build for both backends failed so seems to suggest there really is a problem there.

aiida/common/log.py Outdated Show resolved Hide resolved
Comment on lines +39 to +42
tasks = [t for t in asyncio.Task.all_tasks() if t is not asyncio.Task.current_task()]

for task in tasks:
task.cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a particular bug that you ran into before you added this code? How does it manifest itself if you shutdown the daemon while it is running some processes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a particular bug that you ran into before you added this code?

After migrate to asyncio, with original code, run verdi devel run_daemon and then Ctrl+c to interrupt it won't stop the daemon process and then hang up in command line.

How does it manifest itself if you shutdown the daemon while it is running some processes?

I didn't quite understand your question. The code is referring to this https://www.roguelynn.com/words/asyncio-graceful-shutdowns/
After that, there was no original issue, so I didn't think much about it. My understand is shutdown a daemon is just shutdown its corresponding runner's event loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't quite understand your question.

What I meant to ask was: if you remove the code that you added (that is to say, in the old version of the code) what would happen if you stop a daemon runner. But I think you already answered this just before this. Apparently, without your changes, the daemon would hang when asked to shutdown. In any case, I guess it makes sense to attempt to clean up before shutting down. Just was asking out of interest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a neat solution, I like the look of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muhrin Thanks! that is one thing I'm not sure about, now I don't have to worry.

aiida/engine/launch.py Outdated Show resolved Hide resolved
:param args: input arguments to construct the FunctionProcess
:param kwargs: input keyword arguments to construct the FunctionProcess
:return: tuple of the outputs of the process and the process node pk
:rtype: (dict, int)
"""
manager = get_manager()
runner = manager.create_runner(with_persistence=False)
runner = manager.get_runner(with_persistence=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because now we have just a single event-loop that is made reentrant with nested-asyncio?

Copy link
Member Author

@unkcpz unkcpz Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I didn't test this in a really production environment. I am not sure that the original blocking issues is actually result from reentrant.
@muhrin Could you have a look at this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which 'blocking issues' you're referring to @unkcpz but indeed we now only have the one event loop so never need to create more than one. Does get_runner work if there is no event loop active at all or would this code be unreachable if that were the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following docstring is there before I make the changes. This is what I mean 'blocking issues', I guess this was why a new event loop created in the ProcessFunction.

The function will have to create a new runner for the FunctionProcess instead of using the global runner,
because otherwise if this process function were to call another one from within its scope, that would use
the same runner and it would be blocking the event loop from continuing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which 'blocking issues' you're referring to @unkcpz but indeed we now only have the one event loop so never need to create more than one. Does get_runner work if there is no event loop active at all or would this code be unreachable if that were the case?

Manager.get_runner will create a new runner if it doesn't have one yet. It won't pass in a loop to the Runner constructor, so that one will get whatever loop the constructor logic defines.

aiida/engine/utils.py Outdated Show resolved Hide resolved
tests/engine/test_rmq.py Show resolved Hide resolved
tests/engine/test_work_chain.py Outdated Show resolved Hide resolved
tests/workflows/arithmetic/test_add_multiply.py Outdated Show resolved Hide resolved
@@ -347,6 +348,8 @@ def process_actions(futures_map, infinitive, present, past, wait=False, timeout=
process = futures_map[future]

try:
# unwrap is need here since LoopCommunicator will also wrap a future
future = unwrap_kiwi_future(future)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling that this is in the wrong place. The future that is returned comes from the RemoteProcessThreadController returned by the Manager. What it returns is the return value from the communicator. You are here adding another action on it because apparently the underlying communicator is a LoopCommunicator which adds a layer of wrapping. However, from the process controller's interface, this is not clear at all. Especially if you look at the Manager code, the communicator we pass to the controller is instance of RmqThreadCommunicator and not a LoopCommunicator. Long story short, I think the unwrapping should be done elsewhere such that the user of the controller should never have to know what kind of communicator is being used. Or am I missing something?

Copy link
Member Author

@unkcpz unkcpz Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that this part is not very clear. And I think there is the same issue for the aiida_core of the tornado version(should unwrap the future after you introduced LoopCommunicator to Runner.).

Let me try to describe my understanding of this part of the code:
The future returned from the controller.kill_process is not depend on the communicator we used here, but on communicator which used to actually doing the kill action. That is the LoopCommunicator and we add a rpc_subscriber in that communicator of the daemon Runner.
The reason that we do not need use the LoopCommunicator in verdi process is every time we run a cmd command we run it in a independent thread and wait for the returned value.
And it is because we don't know what communicator we used exactly, we need to put a generic unrap_kiwi_future here to get the final result of the procedure scheduled in the communicator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yes, I'm also getting confused looking through this part of the code. I guess the principle that Sebastiaan is pointing out (and I agree with) is that if there is a Communicator interface that we're using then things that come out of the interface should always be at the same level of 'unwrapping', ideally such that no manual unwrapping needs to be done outside, and certainly in a way that doesn't vary on the underlying concrete Communicator instance type.

I'll have a look through the code and see if I can make sure I understand what the flow is.

aiida/engine/processes/calcjobs/tasks.py Show resolved Hide resolved
@unkcpz
Copy link
Member Author

unkcpz commented Aug 24, 2020

Also, the build on jenkins is failing. It seems the daemon is failing to start. Do you think this is a one-time fluke or there really is a problem?

I think it is result from circus, it is failed because the jenkins install aiida_core with pip install --user .[all] which installed the old circus. I guess it will be fixed after new version circus released. This issue also happened in test-install and ci-test, I workaround it by install circus of my repository. https://github.com/aiidateam/aiida-core/blob/0e1ecfbaa08f3fe0341d4083c693bf13b8884bf9/.github/workflows/ci.yml#L125

@unkcpz unkcpz force-pushed the asyncio-rebase branch 2 times, most recently from dd8b1f9 to 3e4039a Compare August 24, 2020 04:32
Copy link
Member Author

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! @sphuber Some of comments really the code I am also not very sure about the changes. Especially the unwrap of the future in cmd_process`, I need to think it over and get back to you.

aiida/common/log.py Outdated Show resolved Hide resolved
Comment on lines +39 to +42
tasks = [t for t in asyncio.Task.all_tasks() if t is not asyncio.Task.current_task()]

for task in tasks:
task.cancel()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a particular bug that you ran into before you added this code?

After migrate to asyncio, with original code, run verdi devel run_daemon and then Ctrl+c to interrupt it won't stop the daemon process and then hang up in command line.

How does it manifest itself if you shutdown the daemon while it is running some processes?

I didn't quite understand your question. The code is referring to this https://www.roguelynn.com/words/asyncio-graceful-shutdowns/
After that, there was no original issue, so I didn't think much about it. My understand is shutdown a daemon is just shutdown its corresponding runner's event loop.

aiida/engine/launch.py Outdated Show resolved Hide resolved
:param args: input arguments to construct the FunctionProcess
:param kwargs: input keyword arguments to construct the FunctionProcess
:return: tuple of the outputs of the process and the process node pk
:rtype: (dict, int)
"""
manager = get_manager()
runner = manager.create_runner(with_persistence=False)
runner = manager.get_runner(with_persistence=False)
Copy link
Member Author

@unkcpz unkcpz Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I didn't test this in a really production environment. I am not sure that the original blocking issues is actually result from reentrant.
@muhrin Could you have a look at this?

aiida/engine/utils.py Outdated Show resolved Hide resolved
tests/cmdline/commands/test_data.py Show resolved Hide resolved
tests/engine/test_rmq.py Show resolved Hide resolved
tests/engine/test_rmq.py Show resolved Hide resolved
tests/engine/test_work_chain.py Outdated Show resolved Hide resolved
tests/workflows/arithmetic/test_add_multiply.py Outdated Show resolved Hide resolved
Copy link
Contributor

@muhrin muhrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice @unkcpz . I just want to check again this future unwrapping business.

Needless to say it would be great to check this new code against some real running workflows if possible. I think it's also OK for some of the more extensive testing to take place onece this is in develop given that it's not possible to test all possibilties thouroughly beforehand.

:param args: input arguments to construct the FunctionProcess
:param kwargs: input keyword arguments to construct the FunctionProcess
:return: tuple of the outputs of the process and the process node pk
:rtype: (dict, int)
"""
manager = get_manager()
runner = manager.create_runner(with_persistence=False)
runner = manager.get_runner(with_persistence=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which 'blocking issues' you're referring to @unkcpz but indeed we now only have the one event loop so never need to create more than one. Does get_runner work if there is no event loop active at all or would this code be unreachable if that were the case?

tests/cmdline/commands/test_data.py Show resolved Hide resolved
tests/engine/test_rmq.py Show resolved Hide resolved
@@ -347,6 +348,8 @@ def process_actions(futures_map, infinitive, present, past, wait=False, timeout=
process = futures_map[future]

try:
# unwrap is need here since LoopCommunicator will also wrap a future
future = unwrap_kiwi_future(future)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yes, I'm also getting confused looking through this part of the code. I guess the principle that Sebastiaan is pointing out (and I agree with) is that if there is a Communicator interface that we're using then things that come out of the interface should always be at the same level of 'unwrapping', ideally such that no manual unwrapping needs to be done outside, and certainly in a way that doesn't vary on the underlying concrete Communicator instance type.

I'll have a look through the code and see if I can make sure I understand what the flow is.

Comment on lines +39 to +42
tasks = [t for t in asyncio.Task.all_tasks() if t is not asyncio.Task.current_task()]

for task in tasks:
task.cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a neat solution, I like the look of it.

setup.json Outdated Show resolved Hide resolved
@sphuber
Copy link
Contributor

sphuber commented Aug 28, 2020

OK, I think we are almost there @unkcpz and @muhrin . I am just writing down an overview of open questions and things that remain to be checked. Once these are addressed, I will try and run a large production run with the new stack and see how things run.

  1. Check correctness of manually cancelling all event loop tasks when closing daemon runner in aiida/engine/daemon/runner.py
  2. Verify that running a process function no longer needs to create a new runner but just gets the global one: aiida/engine/processes/functions.py
  3. Find a way for the communicator to always return a similar future, regardless of its type, such that manual unwrapping, as done currently in aiida/cmdline/commands/cmd_process.py, is no longer necessary.

@muhrin Have you had a chance to take a look at these three remaining questions?

@sphuber
Copy link
Contributor

sphuber commented Sep 3, 2020

I just created a clean env with this branch and the asyncio compatible plumpy and kiwipy versions and ran a bunch of workchains to test the new code. One of the workchains (the first one that was launched, not sure if by accident) excepted:

state        Excepted <Traceback (most recent call last):
               File "/home/sph/.virtualenvs/aiida_pies/lib/python3.7/site-packages/plumpy/processes.py", line 474, in _process_scope
                 yield
               File "/home/sph/.virtualenvs/aiida_pies/lib/python3.7/site-packages/plumpy/processes.py", line 496, in _run_task
                 result = await coro(*args, **kwargs)
             GeneratorExit

             During handling of the above exception, another exception occurred:

             AssertionError: Somehow, the process at the top of the stack is not me, but another process! (<CifCleanWorkChain> (ProcessState.WAITING) != <CifCleanWorkChain> (ProcessState.RUNNING))
             >

This clearly comes from the process stack that checks for consistency. If you have ideas what could have caused this, let me know and I can try to dig up more debug information or try to create a MWE.

Edit: I am not sure what caused this, but I reran the whole suite, after having deleted all previous workchains and now everything ran fine. One thing that may have had to do with it: I ran the CifCleanWorkChain on a 100 CifData nodes, and in all 100 cases the final primitive_structure_from_cif calcfunction failed because an exception was raised. I saw in the logs, what seemed to be unrelated, the message that there was a conflict in dependency requirements of plumpy, with 0.16.0 being installed (as it should for this branch), but aiida-core requiring plumpy~=0.15.0. I think this was just a weird problem of the virtual env since this branch is still marked as aiida-core==1.3.0 and on pypi this has the requirement plumpy==0.15.0. I think this may have caused the problem. Because after having resolved this, by locally upping the version to v1.4.0 of the aiida-core branch, the incompatibility error disappeared. Now all workchains ran just fine. Still I don't really understand why this would have failed just the calcfunction and not the calculation jobs nor workchains. Weirder still, I don't understand if the assertion raised by the process stack had anything to do with it or was just a fluke. Even if it is a fluke, it is a grave internal inconsistency and would be pretty bad if it occurred, even sporadically. I will try to install this branch on another machine and perform some more testing to see if I can get it to appear again.

Editer: now that we got the tests running (looks like a problem with the new pip resolver) the problem of the process stack actually was reproduced on GHA as well: https://github.com/aiidateam/aiida-core/pull/4317/checks?check_run_id=1074946912
In the .ci/test_daemon.py where we launch a bunch of calcjobs and workchains, the assertion also failed. It is crucial we find out why this happens and fix it.

@sphuber sphuber force-pushed the asyncio-rebase branch 3 times, most recently from 0bcf704 to c8a1d89 Compare September 4, 2020 19:23
@sphuber
Copy link
Contributor

sphuber commented Sep 4, 2020

Tests are failing because it is failing to install the dependencies, especially, it claims there is a dependency conflict between kiwipy and, hang on to your hats, kiwipy!

ERROR: Cannot install kiwipy==0.6.1 and kiwipy[rmq] 0.6.0 because these package versions have conflicting dependencies.

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
The conflict is caused by:
    The user requested kiwipy==0.6.1
    kiwipy[rmq] 0.6.0 depends on kiwipy 0.6.0 (from https://files.pythonhosted.org/packages/fb/5b/9ab8910fcb7b861ae63f0c571bf2be790734a12487c3e5358d6610eef117/kiwipy-0.6.0-py2.py3-none-any.whl#sha256=2e68ba8a74bffb9319c48aa4a33ab78ba5f4c460a1ce7b39092ea2183eb857e5 (from https://pypi.org/simple/kiwipy/) (requires-python:>=3.5))

It's you!

Any of the packaging experts out there know what the hell is going on. @chrisjsewell @greschd @csadorf ?

Edit: to summarize, with this PR we have the following requirements:

aiida-core:

  • kiwipy[rmq]~=0.6.1
  • plumpy~=0.16.1

Where plumpy==0.16.1 has the requirement

  • kiwipy[rmq]~=0.6.0

The only thing I can think of is the difference in kiwipy[rmq]~=0.6.1 and kiwipy[rmq]~=0.6.0, but those requirements should be perfectly compatible...

@chrisjsewell
Copy link
Member

chrisjsewell commented Sep 5, 2020

What happens if you remove the use of the new resolver pip install --use-feature=2020-resolver at https://github.com/unkcpz/aiida_core/blob/c8a1d89bfd5396e92da0e07e09d2487dbc0794ae/.github/workflows/ci.yml#L123? (i.e. is it a bug with the new resolver that should be reported)

@chrisjsewell
Copy link
Member

Well it installs now 😬 (I guess the pytest failure is a whole other issue).
Interesting that the conda resolver fails in the same way as the new pip resolver; I wonder if the implementation "borrows" from conda.

Hey @pradyunsg sorry for the spam, but I wonder if I can quickly pick your brain quickly on this install issue: #4317 (comment).
Does this look like a misunderstanding on our part, or should we raise an issue on pip?

@pradyunsg
Copy link

Does this look like a misunderstanding on our part, or should we raise an issue on pip?

That looks like a bug -- one that has been fixed in pip's master branch. Could you try running off of pip's master branch and check if this bug is reproducible?

@sphuber
Copy link
Contributor

sphuber commented Sep 5, 2020

I wonder if the implementation "borrows" from conda.

When I look at the logs, I see that conda reports a whole shit ton of dependency conflicts though and not just with kiwipy/plumpy? I never use conda so I have no clue, but is that normal or to be expected?

@sphuber
Copy link
Contributor

sphuber commented Sep 5, 2020

That looks like a bug -- one that has been fixed in pip's master branch. Could you try running off of pip's master branch and check if this bug is reproducible?

Thanks for the comment. We currently install pip as pip install --upgrade pip. Should I just replace that with something like pip install --upgrade git+https://github.com/pypa/pip@master ? Will that properly override the currently installed version?

@pradyunsg
Copy link

pradyunsg commented Sep 5, 2020

pip install --upgrade git+https://github.com/pypa/pip@master

pip install https://github.com/pypa/pip/archive/master.zip would almost certainly perform better in more ways than one. The 2020-resolver is still under development, so it's still evolving and getting better at being a reasonable resolver. :)

@sphuber
Copy link
Contributor

sphuber commented Sep 5, 2020

pip install https://github.com/pypa/pip/archive/master.zip would almost certainly perform better in more ways than one. The 2020-resolver is still under development, so it's still evolving and getting better at being a reasonable resolver. :)

Thanks a lot @pradyunsg and not to worry, fully understand that such a big project takes time to iron out some bugs. Happy to provide bug reports if possible. I tried installing of master but that does not seem to have fixed the problem: https://github.com/aiidateam/aiida-core/pull/4317/checks?check_run_id=1075016847 The install on Python 3.8 fails with the same dependency conflict error. The builds on Python 3.5 are still running and I suspect they will actually time out.

@chrisjsewell
Copy link
Member

I never use conda so I have no clue, but is that normal or to be expected?

Yeh that does not look pretty 😬
It feels like maybe conda, and perhaps the new pip resolver, is not loving the ~= type version pinning.
You could try replacing them in environment.yml with >=x.x.x,<y.y.y type versions?

@csadorf
Copy link
Contributor

csadorf commented Nov 23, 2020

Does anyone know when it says python~=3.7 when we also still support 3.6? Is that the problem?

The environment.yml file is a utility file that users should not need to worry about and that has no influence into which environment aiida-core can be installed. We use it to test whether AiiDA is install-able into a valid conda environment, but a user would just say conda install aiida-core or create an environment as described in our docs. When you install a package with conda, the Python version is usually implicitly determined from your base environment, but you can explicitly select a specific Python version by adding it as an explicit dependency, i.e., conda create -n aiida-py38 python=3.8 aiida-core.

So, I don't think that we need to have the Python version specified in environment.yml, but it is concerning that removing it apparently resolves the problem as it hints as to an issue that is specific to Python 3.7.

@unkcpz Which Python version is chosen when you remove the explicit dependency? Also, could you please share your environment with us after you successfully created it as described here?

@ltalirz
Copy link
Member

ltalirz commented Nov 23, 2020

Thanks a lot @unkcpz for figuring this out!

For reference, here is the commit where I introduced the python version in the environment.yml 12f9641
As the commit message says, we figured out that when we were specifying the python version of the environment via the command line, it was not respected.
I just checked and this is still the case with conda 4.9.1

$ conda  env create -f environment.yml -n test-conda python=3.5
Collecting package metadata (repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.9.1
  latest version: 4.9.2

Please update conda by running

    $ conda update -n base -c defaults conda



Downloading and Extracting Packages
tzdata-2020d         | 110 KB    | ################################################################################################################################################################### | 100%
certifi-2020.11.8    | 148 KB    | ################################################################################################################################################################### | 100%
wheel-0.35.1         | 38 KB     | ################################################################################################################################################################### | 100%
setuptools-50.3.1    | 726 KB    | ################################################################################################################################################################### | 100%
pip-20.2.4           | 1.7 MB    | ################################################################################################################################################################### | 100%
python-3.9.0         | 9.9 MB    | ################################################################################################################################################################### | 100%

I.e. specifying the python version inside the environment.yml was a way for us to make the build of the environment on CI reproducible.

python 3.7 was probably chosen because it was the latest python release that we tested on, i.e. the version number can be changed (it could even be changed to python 3.6 if that helps).

The commit message also mentions that at that time there had been an issue with plumpy, which had not yet been marked as compatible with python 3.7 - but that issue had been fixed before this commit (perhaps coincidence).

@unkcpz
Copy link
Member Author

unkcpz commented Nov 24, 2020

@unkcpz Which Python version is chosen when you remove the explicit dependency? Also, could you please share your environment with us after you successfully created it as described here?

Here as @ltalirz mentioned, I run conda env create -f environment.yml -n test-conda python=3.5 to create the envs locally. All python versions specified are good.
Changing python~=3.7 to python~=3.6 in the file really solves the problem. So we adopt this as workaround?

@unkcpz
Copy link
Member Author

unkcpz commented Nov 24, 2020

The problem may be caused by the package aiocontextvars which is used to support to the PEP-567 that is 100% contextvars provided by python>=3.7 std library, you can check its dependencies here at https://github.com/fantix/aiocontextvars/blob/master/setup.py.

This dependency can be removed once we no longer support python3.6.

@csadorf
Copy link
Contributor

csadorf commented Nov 24, 2020

@unkcpz Thx for identifying the issue! It would still be great if you could share your environment.yml for a successful installation. Thx!

@unkcpz
Copy link
Member Author

unkcpz commented Nov 24, 2020

@csadorf Sure, but I don't think it helps, in my local machine, I only use conda to create the env and install the packages by pip

environment.yml

name: /data/CONDA_ENV/aiida-env-testing
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - backcall=0.2.0=py_0
  - binutils_impl_linux-64=2.33.1=he6710b0_7
  - binutils_linux-64=2.33.1=h9595d00_15
  - ca-certificates=2020.10.14=0
  - certifi=2020.11.8=py38h06a4308_0
  - decorator=4.4.2=py_0
  - gcc_impl_linux-64=7.3.0=habb00fd_1
  - gcc_linux-64=7.3.0=h553295d_15
  - gxx_impl_linux-64=7.3.0=hdf63c60_1
  - gxx_linux-64=7.3.0=h553295d_15
  - ipython=7.19.0=py38hb070fc8_0
  - ipython_genutils=0.2.0=py38_0
  - jedi=0.17.2=py38_0
  - ld_impl_linux-64=2.33.1=h53a641e_7
  - libedit=3.1.20191231=h14c3975_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=9.1.0=hdf63c60_0
  - libstdcxx-ng=9.1.0=hdf63c60_0
  - ncurses=6.2=he6710b0_1
  - openssl=1.1.1h=h7b6447c_0
  - parso=0.7.0=py_0
  - pexpect=4.8.0=pyhd3eb1b0_3
  - pickleshare=0.7.5=py38_1000
  - pip=20.2.4=py38h06a4308_0
  - prompt-toolkit=3.0.8=py_0
  - ptyprocess=0.6.0=pyhd3eb1b0_2
  - pygments=2.7.2=pyhd3eb1b0_0
  - python=3.8.5=h7579374_1
  - readline=8.0=h7b6447c_0
  - setuptools=50.3.1=py38h06a4308_1
  - sqlite=3.33.0=h62c20be_0
  - tk=8.6.10=hbc83047_0
  - traitlets=5.0.5=py_0
  - wcwidth=0.2.5=py_0
  - wheel=0.35.1=pyhd3eb1b0_0
  - xz=5.2.5=h7b6447c_0
  - zlib=1.2.11=h7b6447c_3
  - pip:
    - aiida-core==1.5.0
    - aio-pika==6.7.1
    - aiocontextvars==0.2.2
    - aiormq==3.3.1
    - aldjemy==0.9.1
    - alembic==1.4.3
    - archive-path==0.2.1
    - async-generator==1.10
    - bcrypt==3.2.0
    - cffi==1.14.3
    - circus==0.17.1
    - click==7.1.2
    - click-completion==0.5.2
    - click-config-file==0.6.0
    - click-spinner==0.1.10
    - configobj==5.0.6
    - cryptography==3.2.1
    - deprecation==2.1.0
    - django==2.2.17
    - ete3==3.1.2
    - frozendict==1.2
    - idna==2.10
    - jinja2==2.11.2
    - kiwipy==0.6.1
    - mako==1.1.3
    - markupsafe==1.1.1
    - multidict==5.0.2
    - nest-asyncio==1.4.3
    - numpy==1.19.4
    - packaging==20.4
    - pamqp==2.3.0
    - paramiko==2.7.2
    - pgsu==0.1.0
    - plumpy==0.17.0
    - psutil==5.7.3
    - psycopg2-binary==2.8.6
    - pycparser==2.20
    - pynacl==1.4.0
    - pyparsing==2.4.7
    - python-dateutil==2.8.1
    - python-editor==1.0.4
    - python-graphviz==0.15
    - pytray==0.2.2
    - pytz==2019.3
    - pyyaml==5.1.2
    - pyzmq==20.0.0
    - reentry==1.3.1
    - shellingham==1.3.2
    - shortuuid==1.0.1
    - simplejson==3.17.2
    - six==1.15.0
    - sqlalchemy==1.3.20
    - sqlalchemy-utils==0.34.2
    - sqlparse==0.4.1
    - tabulate==0.8.7
    - tornado==6.1
    - tqdm==4.53.0
    - tzlocal==2.1
    - upf-to-json==0.9.2
    - wrapt==1.11.2
    - yarl==1.6.3
prefix: /data/CONDA_ENV/aiida-env-testing

@csadorf
Copy link
Contributor

csadorf commented Nov 24, 2020

The problem may be caused by the package aiocontextvars which is used to support to the PEP-567 that is 100% contextvars provided by python>=3.7 std library, you can check its dependencies here at https://github.com/fantix/aiocontextvars/blob/master/setup.py.

This dependency can be removed once we no longer support python3.6.

@unkcpz That is 100% the problem. This conda package explicitly requires python<3.7. Which package depends on it?

@ltalirz
Copy link
Member

ltalirz commented Nov 24, 2020

@csadorf Turns out it's plumpy https://github.com/conda-forge/plumpy-feedstock/blob/fe935aa06b56595ef3d5b2eaccafbe85e52da892/recipe/meta.yaml#L27

So, plumpy should depend on this package only for python < 3.7 (which will force us to remove the noarch from plumpy)
Can someone please take care of this?

@unkcpz
Copy link
Member Author

unkcpz commented Nov 25, 2020

That is 100% the problem. This conda package explicitly requires python<3.7. Which package depends on it?

Thanks @csadorf. aiocontextvars is required by plumpy to support contextvar which is new in python3.7. In order to use this in python3.6 the aiocontextvars is needed. For environment py>36 the the native contextvars is called by this package so it support python>3.6 in principle.

@sphuber
Copy link
Contributor

sphuber commented Nov 25, 2020

So, plumpy should depend on this package only for python < 3.7 (which will force us to remove the noarch from plumpy)
Can someone please take care of this?

@ltalirz does this only need to happen for the feedstock, or will I have to release a 0.17.1 on PyPI as well where the requirements are updated?

@chrisjsewell
Copy link
Member

chrisjsewell commented Nov 25, 2020

does this only need to happen for the feedstock, or will I have to release a 0.17.1 on PyPI as well where the requirements are updated?

I think we will need to release as well. While we are at it: gentle prod to review aiidateam/plumpy#180, then I'll be happy to handle these releases 😉

@csadorf
Copy link
Contributor

csadorf commented Nov 25, 2020

does this only need to happen for the feedstock, or will I have to release a 0.17.1 on PyPI as well where the requirements are updated?

I think we will need to release as well.

I would think just for the feedstock?

@sphuber
Copy link
Contributor

sphuber commented Nov 25, 2020

I think we will need to release as well. While we are at it: gentle prod to review aiidateam/plumpy#180, then I'll be happy to handle these releases wink

I will hold off with that until this is merged and stable. The mypy PR contains a lot of changes some and may have an effect on this PR here which may not be tested.

@unkcpz
Copy link
Member Author

unkcpz commented Nov 25, 2020

Thanks @chrisjsewell, I'll get time tomorrow to take care of aiidateam/plumpy#180 . But I agree with Sebastiaan we wait a minute after this PR.

@chrisjsewell
Copy link
Member

The mypy PR contains a lot of changes some and may have an effect on this PR here which may not be tested.

Fair 👍
(although in principle it should only break this PR if the PR is already broken, i.e. has faulty types)

unkcpz and others added 8 commits November 27, 2020 13:42
The `plumpy` and `kiwipy` dependencies have already been migrated from
using `tornado` to the Python built-in module `asyncio` in the versions
`0.16.0` and `0.6.0`, respectively. This allows us to also rid AiiDA of
the `tornado` dependency, which has been giving requirement clashes with
other tools, specifically from the Jupyter and iPython world. The final
limitation was the `circus` library that is used to daemonize the daemon
workers, which as of `v0.17.1` also supports `tornado~=5`.

A summary of the changes:

 * Replace `tornado.ioloop` with `asyncio` event loop.
 * Coroutines are marked with `async` instead of decorated with the
   `tornado.gen.coroutine` decorator.
 * Replace `yield` with `await` when calling a coroutine.
 * Replace `raise tornado.gen.Return` with `return` when returning from
   a coroutine.
 * Replace `add_callback` call on event loop with `call_soon` when
   scheduling a callback.
 * Replace `add_callback` call on event loop with `create_task` when
   scheduling `process.step_until_terminated()`.
 * Replace `run_sync` call on event loop with `run_until_complete`.
 * Replace `pika` uses with `aio-pika` which is now used by the `plumpy`
   and `kiwipy` libraries.
 * Replace `concurrent.Future` with `asyncio.Future`.
 * Replace `yield tornado.gen.sleep` with `await asyncio.sleep`.

Additional changes:

 * Remove the `tornado` logger from the logging configuration.
 * Remove the `logging.tornado_loglevel` configuration option.
 * Turn the `TransportQueue.loop` attribute from method into property.
 * Call `Communicator.close()` instead of `Communicator.stop()` in the
   `Manager.close()` method. The `stop` method has been deprecated in
   `kiwipy==0.6.0`.
The result returned by `ProcessController.kill_process` that is called
in `Process.kill` for each of its children, if it has any, can itself be
a future, since the killing cannot always be performed directly, but
instead will be scheduled in the event loop. To resolve the future of
the main process, it will have to wait for the futures of all its
children to be resolved as well. Therefore an intermediate future needs
to be added that will be done once all child futures are resolved.
The commands of `verdi process` that perform an RPC on a live process
will do so through the `ProcessController`, which returns a future.
Currently, the process controller uses the `LoopCommunicator` as its
communicator which adds an additional layer of wrapping. Ideally, the
return type of the communicator should not change depending on the
specific implementation that is used, however, for now that is the case
and so the future needs to be unwrapped explicitly one additional time.
Once the `LoopCommunicator` is fixed to return the same future type as
the base `Communicator` class, this workaround can and should be
removed.
With the migration to `asyncio`, there is now only a single event loop
that is made reentrant through the `nest-asyncio` library, that monkey
patches `asyncio`'s built-in mechanism to prevent this. This means that
in the `Runner` constructor, we should simply get the global event loop
instead of creating a new one, if no explicit loop is passed into the
constructor. This also implies that the runner should never take charge
in closing the loop, because it no longer owns the global loop.

In addition, process functions now simply use the global runner instead
of creating a new runner. This used to be necessary because running in
the same runner, would mean running in the same loop and so the child
process would block the parent. However, with the new design on
`asyncio`, everything runs in a single reentrant loop and so child
processes no longer need to spawn their own independent nested runner.
When a daemon runner is started, the `SIGINT` and `SIGTERM` signals are
captured to shutdown the runner before exiting the interpreter. However,
the async tasks associated with the interpreter should be properly
canceled first.
The event loop implementation of `asyncio` does not allow to make the
event loop to be reentrant, which essentially means that event loops
cannot be nested. One event loop cannot be run within another event
loop. However, this concept is crucial for `plumpy`'s design to work and
was perfectly allowed by the previous event loop provider `tornado`.

To work around this, `plumpy` uses the library `nest_asyncio` to patch
the `asyncio` event loop and make it reentrant. The trick is that this
should be applied at the correct time. Here we update the `Runner` to
enable `plumpy`'s event loop policy, which will patch the default event
loop policy. This location is chosen since any process in `aiida-core`
*has* to be run by a `Runner` and only one runner instance will ever be
created in a Python interpreter. When the runner shuts down, the event
policy is reset to undo the patch.
RabbitMQ 3.6 changed the way integer values are interpreted for
connection parameters. This would cause certain integer values that used
to be perfectly acceptable, to all of suddent cause the declaration of
resources, such as channels and queues, to fail.

The library `pamqp`, that is used by `aiormq`, which in turn is used
ultimately by `kiwipy` to communicate with the RabbitMQ server, adapted
to these changes, but this would break code with RabbitMQ 3.5 that used
to work just fine. For example, the message TTL when declaring a queue
would now fail when `32767 < TTL < 655636` due to incorrect
interpretation of the integer type.

The library `pamqp` provides a way to enable compatibility with these
older versions. One should merely call the method:

    pamqp.encode.support_deprecated_rabbitmq()

This will enable the legacy integer conversion table and will restore
functionality for RabbitMQ 3.5.
@sphuber sphuber merged commit 716a1d8 into aiidateam:develop Nov 27, 2020
@unkcpz unkcpz deleted the asyncio-rebase branch January 23, 2025 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/ready-for-review PR is ready to be reviewed topic/engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove upper limit for tornado dependency by replacing it with asyncio
8 participants