Replies: 2 comments
-
@gonzaq94 thanks for the interest. Please provide these following information:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
close due to inactivity, feel free to re-open |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Python version (
python3 -V
)3.8
NVFlare version (
python3 -m pip list | grep "nvflare"
)2.3.8
NVFlare branch (if running examples, please use the branch that corresponds to the NVFlare version,
git branch
)2.3.8
Operating system
Ubuntu 18.04.3 LTS
Have you successfully run any of the following examples?
Please describe your question
I run into the following error when trying to connect a client to a server. However, if I launch the administrator in the same machine, I can successfully connect to the server. In addition, I can ping the server with its FQDN from the machine at it responds.
Driver Version: 530.41.03
CUDA Version: 12.1
PYTHONPATH is /local/custom::/net/frbucx05nvsr01n/vol/static02/tomo-database/bcare/accounts/quintana_g/gitlab/mammo-classifier/mmar/custom start fl because process of 27926 does not exist new pid 22573 Waiting for SP.... 2024-03-15 14:42:22,345 - Cell - INFO - site-1: created backbone external connector to grpc://lxbuc-ama15.em.health.ge.com:8006 2024-03-15 14:42:22,345 - ConnectorManager - INFO - 22573: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'} 2024-03-15 14:42:22,350 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:28838] is starting 2024-03-15 14:42:22,852 - Cell - INFO - site-1: created backbone internal listener for tcp://localhost:28838 2024-03-15 14:42:22,853 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 ACTIVE grpc://lxbuc-ama15.em.health.ge.com:8006] is starting 2024-03-15 14:42:22,855 - FederatedClient - INFO - Wait for engine to be created. E0315 14:42:22.870027894 22624 http_proxy.cc:83] cannot parse value of 'http_proxy' env var. Error: OK 2024-03-15 14:42:22,870 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioGrpcDriver - INFO - created secure channel at lxbuc-ama15.em.health.ge.com:8006 2024-03-15 14:42:24,271 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - Connection [CN00002 N/A => lxbuc-ama15.em.health.ge.com:8006 SSL N/A] is closed by peer 2024-03-15 14:42:24,271 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - [CN00002 N/A => lxbuc-ama15.em.health.ge.com:8006 SSL N/A]: in aio_ctx: done read_loop 2024-03-15 14:42:24,272 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - Closed GRPC Channel 2024-03-15 14:42:24,272 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioGrpcDriver - INFO - CLIENT: connection [CN00002 Not Connected] closed E0315 14:42:25.278318666 22624 http_proxy.cc:83] cannot parse value of 'http_proxy' env var. Error: OK 2024-03-15 14:42:25,278 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioGrpcDriver - INFO - created secure channel at lxbuc-ama15.em.health.ge.com:8006 2024-03-15 14:42:27,694 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - Connection [CN00003 N/A => lxbuc-ama15.em.health.ge.com:8006 SSL N/A] is closed by peer 2024-03-15 14:42:27,695 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - [CN00003 N/A => lxbuc-ama15.em.health.ge.com:8006 SSL N/A]: in aio_ctx: done read_loop 2024-03-15 14:42:27,695 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - Closed GRPC Channel 2024-03-15 14:42:27,695 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioGrpcDriver - INFO - CLIENT: connection [CN00003 Not Connected] closed 2024-03-15 14:42:28,188 - Cell - ERROR - [ME=site-1 O=? D=server F=? T=? CH=task TP=register] timeout on Request 6fac1fcd-7a1d-4aac-a8b2-810106f47f09 for ['register'] after 5.0 secs E0315 14:42:29.703019541 22624 http_proxy.cc:83] cannot parse value of 'http_proxy' env var. Error: OK 2024-03-15 14:42:29,703 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioGrpcDriver - INFO - created secure channel at lxbuc-ama15.em.health.ge.com:8006 2024-03-15 14:42:34,128 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - Connection [CN00004 N/A => lxbuc-ama15.em.health.ge.com:8006 SSL N/A] is closed by peer 2024-03-15 14:42:34,129 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - [CN00004 N/A => lxbuc-ama15.em.health.ge.com:8006 SSL N/A]: in aio_ctx: done read_loop 2024-03-15 14:42:34,129 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - Closed GRPC Channel 2024-03-15 14:42:34,130 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioGrpcDriver - INFO - CLIENT: connection [CN00004 Not Connected] closed 2024-03-15 14:42:35,192 - Cell - ERROR - [ME=site-1 O=? D=server F=? T=? CH=task TP=register] timeout on Request bb4661ab-89ec-4c60-b0ef-17634b9dd7a3 for ['register'] after 5.0 secs 2024-03-15 14:42:37,195 - Cell - WARNING - [ME=site-1 O=? D=server F=? T=? CH=task TP=register] no connection to server 2024-03-15 14:42:37,195 - Cell - ERROR - [ME=site-1 O=? D=server F=? T=? CH=task TP=register] cannot send to 'server': target_unreachable Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/net/10.215.25.16/export/home2/gonzalo_venv/nvflare-2.3.8-env/lib/python3.8/site-packages/nvflare/ha/dummy_overseer_agent.py", line 112, in _rnq_worker self._do_callback() File "/net/10.215.25.16/export/home2/gonzalo_venv/nvflare-2.3.8-env/lib/python3.8/site-packages/nvflare/ha/dummy_overseer_agent.py", line 106, in _do_callback self._update_callback(self) File "/net/10.215.25.16/export/home2/gonzalo_venv/nvflare-2.3.8-env/lib/python3.8/site-packages/nvflare/private/fed/client/fed_client_base.py", line 158, in overseer_callback self.set_primary_sp(sp) File "/net/10.215.25.16/export/home2/gonzalo_venv/nvflare-2.3.8-env/lib/python3.8/site-packages/nvflare/private/fed/client/fed_client_base.py", line 386, in set_primary_sp return pool.map(partial(self.set_sp, sp=sp), tuple(self.servers)) File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.8/multiprocessing/pool.py", line 768, in get raise self._value File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(*args)) File "/net/10.215.25.16/export/home2/gonzalo_venv/nvflare-2.3.8-env/lib/python3.8/site-packages/nvflare/private/fed/client/fed_client_base.py", line 173, in set_sp self._create_cell(location, scheme) File "/net/10.215.25.16/export/home2/gonzalo_venv/nvflare-2.3.8-env/lib/python3.8/site-packages/nvflare/private/fed/client/fed_client_base.py", line 231, in _create_cell raise RuntimeError(f"Failed to get engine after {time.time()-start} seconds") RuntimeError: Failed to get engine after 15.002345561981201 seconds E0315 14:42:38.141165958 22624 http_proxy.cc:83] cannot parse value of 'http_proxy' env var. Error: OK 2024-03-15 14:42:38,141 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioGrpcDriver - INFO - created secure channel at lxbuc-ama15.em.health.ge.com:8006 2024-03-15 14:42:44,199 - Cell - ERROR - [ME=site-1 O=? D=server F=? T=? CH=task TP=register] timeout on Request aa5f67c8-7db0-4b81-bfaf-5211b989293e for ['register'] after 5.0 secs 2024-03-15 14:42:46,585 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - Connection [CN00005 N/A => lxbuc-ama15.em.health.ge.com:8006 SSL N/A] is closed by peer 2024-03-15 14:42:46,585 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - [CN00005 N/A => lxbuc-ama15.em.health.ge.com:8006 SSL N/A]: in aio_ctx: done read_loop 2024-03-15 14:42:46,585 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioStreamSession - INFO - Closed GRPC Channel 2024-03-15 14:42:46,586 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioGrpcDriver - INFO - CLIENT: connection [CN00005 Not Connected] closed 2024-03-15 14:42:46,587 - nvflare.fuel.f3.sfm.conn_manager - INFO - Retrying [CH00001 ACTIVE grpc://lxbuc-ama15.em.health.ge.com:8006] in 8 seconds 2024-03-15 14:42:51,203 - Cell - ERROR - [ME=site-1 O=? D=server F=? T=? CH=task TP=register] timeout on Request 63355cd6-a481-4ecd-bac0-f746c3754925 for ['register'] after 5.0 secs 2024-03-15 14:42:53,206 - Cell - WARNING - [ME=site-1 O=? D=server F=? T=? CH=task TP=register] no connection to server 2024-03-15 14:42:53,206 - Cell - ERROR - [ME=site-1 O=? D=server F=? T=? CH=task TP=register] cannot send to 'server': target_unreachable E0315 14:42:54.606647123 22624 http_proxy.cc:83] cannot parse value of 'http_proxy' env var. Error: OK 2024-03-15 14:42:54,606 - nvflare.fuel.f3.drivers.aio_grpc_driver.AioGrpcDriver - INFO - created secure channel at lxbuc-ama15.em.health.ge.com:8006 2024-03-15 14:43:00,210 - Cell - ERROR - [ME=site-1 O=? D=server F=? T=? CH=task TP=register] timeout on Request 1dbab6f3-a65d-40bd-a740-c66a8ee39ce5 for ['register'] after 5.0 secs
Beta Was this translation helpful? Give feedback.
All reactions