-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that children processes are properly killed when parent killed and clean up nodes #3776
Comments
One more scenario, when killing the top work chain doesn't kill its children: So I am running the following set of processes When trying to kill the top work chain (EOS) using If I submit the work chain to the daemon instead - everything can be killed just fine. aiida version: 1.5.0 |
@yakutovicha Could you please check the daemon logs for
If that is found, then #4669 may fix this issue Edit: Sorry @yakutovicha , I misread your comment. Since this is limited to |
Firstly, to note that now #4669 has been merged, instead of the Secondly, I have now created https://github.com/aiidateam/aiida-integration-tests. (see also #4603 (comment)) I can reproduce the issue whereby killing a running workchain (not submitted to the daemon), leaves the children unreachable: Here is the outcome of root@2940cef2c10d:~# aiida-sleep workchain -nw 1 -nc 10 -t 120
^C01/26/2021 04:14:18 PM <294> aiida.engine.runners: [CRITICAL] runner received interrupt, killing process 13
Traceback (most recent call last):
File "/opt/venv/bin/aiida-sleep", line 8, in <module>
sys.exit(main())
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/root/aiida-core/aiida/cmdline/utils/decorators.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/aiida_sleep/cli.py", line 109, in run_workchains_cli
node = run_workchain(number_calc, code, time, payload, output, fail, submit)
File "/opt/venv/lib/python3.8/site-packages/aiida_sleep/cli.py", line 139, in run_workchain
node = run_get_node(builder).node
File "/root/aiida-core/aiida/engine/launch.py", line 58, in run_get_node
return runner.run_get_node(process, *args, **inputs)
File "/root/aiida-core/aiida/engine/runners.py", line 268, in run_get_node
result, node = self._run(process, *args, **inputs)
File "/root/aiida-core/aiida/engine/runners.py", line 244, in _run
process_inited.execute()
File "/opt/venv/lib/python3.8/site-packages/plumpy/processes.py", line 79, in func_wrapper
return func(self, *args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/plumpy/processes.py", line 1150, in execute
return self.future().result()
plumpy.exceptions.KilledError: Process was killed because the runner received an interrupt
root@2940cef2c10d:~# verdi process list -a
PK Created Process label Process State Process status
---- --------- ---------------- --------------- -----------------------------------------------------------
5 35m ago SleepCalculation ⏹ Finished [0]
13 26s ago SleepWorkChain ☠ Killed Process was killed because the runner received an interrupt
14 26s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
15 26s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
16 26s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
17 25s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
18 25s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
19 25s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
20 25s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
21 25s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
22 25s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
23 24s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
Total results: 12
Info: last time an entry changed state: 13s ago (at 16:14:18 on 2021-01-26) If you try to kill a child calculation it is unreachable: root@2940cef2c10d:~# verdi process kill 14
Error: Process<14> is unreachable Here is the outcome of starting then killing a submitted workchain with children (all good): root@2940cef2c10d:~# aiida-sleep workchain -nw 1 -nc 10 -t 120 -s
uuid: f48405c3-53c4-4143-8ce6-5f751003eb4c (pk: 37) (aiida.workflows:sleep)
root@2940cef2c10d:~# verdi process list -a
PK Created Process label Process State Process status
---- --------- ---------------- --------------- -------------------------------------------------------------------
5 41m ago SleepCalculation ⏹ Finished [0]
13 6m ago SleepWorkChain ☠ Killed Process was killed because the runner received an interrupt
14 6m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
15 6m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
16 6m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
17 6m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
18 6m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
19 6m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
20 6m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
21 6m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
22 6m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
23 6m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
37 17s ago SleepWorkChain ⏵ Waiting Waiting for child processes: 38, 39, 40, 41, 42, 43, 44, 45, 46, 47
38 17s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
39 16s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
40 16s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
41 16s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
42 16s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
43 16s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
44 16s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
45 15s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
46 15s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
47 15s ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
Total results: 23
Info: last time an entry changed state: 5s ago (at 16:20:44 on 2021-01-26) root@2940cef2c10d:~# verdi process kill 37
Success: killed Process<37>
root@2940cef2c10d:~# verdi process list -a
PK Created Process label Process State Process status
---- --------- ---------------- --------------- -----------------------------------------------------------
5 42m ago SleepCalculation ⏹ Finished [0]
13 7m ago SleepWorkChain ☠ Killed Process was killed because the runner received an interrupt
14 7m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
15 7m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
16 7m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
17 7m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
18 7m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
19 7m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
20 7m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
21 7m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
22 7m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
23 7m ago SleepCalculation ⏵ Waiting Waiting for transport task: submit
37 42s ago SleepWorkChain ☠ Killed Killed through `verdi process kill`
38 41s ago SleepCalculation ☠ Killed Killed by parent<37>
39 41s ago SleepCalculation ☠ Killed Killed by parent<37>
40 41s ago SleepCalculation ☠ Killed Killed by parent<37>
41 41s ago SleepCalculation ☠ Killed Killed by parent<37>
42 40s ago SleepCalculation ☠ Killed Killed by parent<37>
43 40s ago SleepCalculation ☠ Killed Killed by parent<37>
44 40s ago SleepCalculation ☠ Killed Killed by parent<37>
45 40s ago SleepCalculation ☠ Killed Killed by parent<37>
46 39s ago SleepCalculation ☠ Killed Killed by parent<37>
47 39s ago SleepCalculation ☠ Killed Killed by parent<37>
Total results: 23
Info: last time an entry changed state: 6s ago (at 16:21:08 on 2021-01-26)
root@2940cef2c10d:~# |
Also, to copy over a comment from @unkcpz in the closed duplicate #4298 More information about this issue.
|
There are various scenarios possible where when killing a parent process not all children processes are properly killed as well, or maybe just the nodes are not properly updated. Make sure that the process tasks are properly acknowledged so they don't remain in the queue and make sure to wrap up the nodes.
The text was updated successfully, but these errors were encountered: