You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NVFlare version (python3 -m pip list | grep "nvflare")
2.5.2
NVFlare branch (if running examples, please use the branch that corresponds to the NVFlare version, git branch)
No response
Operating system
MacOS + Amazon Linux 2023
Have you successfully run any of the following examples?
hello-numpy-sag with simulator
hello-pt with simulator
hello-numpy-sag with POC
hello-pt with POC
Please describe your question
I tried to test a real-world scenario using NVFlare where a client runs in my MacOS and the server runs in an AWS EC2 instance. Below are the steps I followed:
Associated an elastic IP address to the instance (assume a.b.c.d).
Created an A record for my subdomain (assume nvflare.mydomain.com) to a.b.c.d.
Tested traffic routing with nc -l 0.0.0.0 8002 on the server and nc nvflare.mydomain.com:8002 on the client, and everything seemed ok.
Used nvflare.mydomain.com in the project.yml, provisioned, transferred files, and started the server and client.
Received the errors below (TLDR: target_unreachable in client):
2025-02-0321:09:36,572- nvflare.fuel.utils.config_service -DEBUG- got var max_message_size from env
2025-02-0321:09:36,572- CoreCell -DEBUG- connection secure: True
2025-02-0321:09:36,572- CoreCell -DEBUG- site-1: max_msg_size=21443379042025-02-0321:09:36,572- CoreCell -DEBUG- Creating Cell: site-12025-02-0321:09:36,572- nvflare.fuel.utils.config_service -DEBUG- got var heartbeat_interval from env
2025-02-0321:09:36,572- nvflare.fuel.utils.config_service -DEBUG- got var backbone_conn_gen from env
2025-02-0321:09:36,572- nvflare.fuel.utils.config_service -DEBUG- got var internal_conn_scheme from env
2025-02-0321:09:36,572- nvflare.fuel.utils.config_service -DEBUG- got var allow_adhoc_conns from env
2025-02-0321:09:36,572- nvflare.fuel.utils.config_service -DEBUG- got var adhoc_conn_scheme from env
2025-02-0321:09:36,572- ConnectorManager -DEBUG- internal scheme=tcp, resources={'host': 'localhost'}
2025-02-0321:09:36,572- ConnectorManager -DEBUG- adhoc scheme=tcp, resources={}
2025-02-0321:09:36,680- nvflare.fuel.utils.config_service -DEBUG- got var streaming_chunk_size from env
2025-02-0321:09:36,680- Cell -DEBUG- __getattr__: args=(), kwargs={}
2025-02-0321:09:36,680- Cell -DEBUG- calling core_cell start2025-02-0321:09:36,680- CoreCell -DEBUG- site-1: creating connector to grpc://nvflare.mydomain.com:80022025-02-0321:09:36,753- nvflare.fuel.utils.config_service -DEBUG- got var use_aio_grpc from env
2025-02-0321:09:36,753- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver AioGrpcDriver is registered for agrpc
2025-02-0321:09:36,753- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver AioGrpcDriver is registered for agrpcs
2025-02-0321:09:36,774- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver AioHttpDriver is registered for http
2025-02-0321:09:36,774- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver AioHttpDriver is registered for https
2025-02-0321:09:36,774- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver AioHttpDriver is registered forws2025-02-0321:09:36,774- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver AioHttpDriver is registered for wss
2025-02-0321:09:36,779- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver AioTcpDriver is registered for atcp
2025-02-0321:09:36,779- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver AioTcpDriver is registered for satcp
2025-02-0321:09:36,779- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver TcpDriver is registered for tcp
2025-02-0321:09:36,779- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver TcpDriver is registered for stcp
2025-02-0321:09:36,782- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver GrpcDriver is registered for grpc
2025-02-0321:09:36,782- nvflare.fuel.f3.drivers.driver_manager -DEBUG- Driver GrpcDriver is registered for grpcs
2025-02-0321:09:36,783- nvflare.fuel.utils.config_service -DEBUG- got var comm_driver_path from env
2025-02-0321:09:36,783- nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver -DEBUG- GRPC Config: max_workers=100, options=[('grpc.max_send_message_length', 2145386496), ('grpc.max_receive_message_length', 2145386496)]
2025-02-0321:09:36,783- nvflare.fuel.f3.sfm.conn_manager -DEBUG- Connector [CH00001 ACTIVE grpc://nvflare.mydomain.com:8002] is created
2025-02-0321:09:36,783- CoreCell - INFO - site-1: created backbone external connector to grpc://nvflare.mydomain.com:80022025-02-0321:09:36,783- ConnectorManager - INFO -87641: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2025-02-0321:09:36,784- nvflare.fuel.utils.config_service -DEBUG- got var comm_driver_path from env
2025-02-0321:09:36,784- nvflare.fuel.f3.sfm.conn_manager -DEBUG- Connector [CH00002 PASSIVE tcp://0:27948] is created
2025-02-0321:09:36,784- nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:27948] is starting
2025-02-0321:09:36,784- ConnectorManager -DEBUG-87641: ############ dynamic listener at tcp://localhost:279482025-02-0321:09:37,289- CoreCell - INFO - site-1: created backbone internal listener for tcp://localhost:279482025-02-0321:09:37,290- nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 ACTIVE grpc://nvflare.mydomain.com:8002] is starting
2025-02-0321:09:37,291- nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver -DEBUG- CLIENT: trying connect ...2025-02-0321:09:37,291- nvflare.fuel.f3.communicator -DEBUG- Communicator for local endpoint: site-1is started
2025-02-0321:09:37,291- nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver -DEBUG- CLIENT: creating secure channel
2025-02-0321:09:37,292- FederatedClient - INFO - Wait for engine to be created.
2025-02-0321:09:37,318- nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver - INFO - created secure channel at nvflare.mydomain.com:80022025-02-0321:09:37,318- nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver -DEBUG- CLIENT: got stub
2025-02-0321:09:37,318- nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00002 N/A => nvflare.mydomain.com:8002] is created: PID: 876412025-02-0321:09:37,318- nvflare.fuel.f3.sfm.sfm_conn -DEBUG- Sending frame: Prefix(length=69, header_len=0, type=4, reserved=0, flags=0, app_id=0, stream_id=1, sequence=1) on [CN00002 N/A => nvflare.mydomain.com:8002]
2025-02-0321:09:37,318- nvflare.fuel.f3.drivers.grpc_driver.StreamConnection -DEBUG- CLIENT: queued frame #12025-02-0321:09:37,318- nvflare.fuel.f3.drivers.base_driver -DEBUG- Connection created: GrpcDriver:[CN00002 N/A => nvflare.mydomain.com:8002]
2025-02-0321:09:37,318- nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver -DEBUG- CLIENT: added connection
2025-02-0321:09:37,325- nvflare.fuel.f3.drivers.grpc_driver.StreamConnection -DEBUG- CLIENT: generate_output in thread Thread-5 (_run)
2025-02-0321:09:37,325- nvflare.fuel.f3.drivers.grpc_driver.StreamConnection -DEBUG- CLIENT: outgoing frame #12025-02-0321:09:37,325- nvflare.fuel.f3.drivers.grpc_driver.StreamConnection -DEBUG- CLIENT: started read_loop in thread conn_mgr_1
2025-02-0321:09:37,576- nvflare.fuel.utils.config_service -DEBUG- got var job_query_timeout from env
2025-02-0321:09:37,582- Cell -DEBUG- __getattr__: args=(), kwargs={'target': 'server', 'channel': 'task', 'topic': 'challenge', 'request': <nvflare.fuel.f3.message.Message object at 0x11cbf9fa0>, 'timeout': 30.0}
2025-02-0321:09:37,582- Cell -DEBUG- calling core_cell send_request
2025-02-0321:09:37,582- CoreCell -DEBUG- site-1: sending request task:challenge to server
2025-02-0321:09:37,582- CoreCell -DEBUG- site-1: broadcasting to ['server'] ...2025-02-0321:09:37,582- CoreCell -DEBUG- site-1: finding pathto server
2025-02-0321:09:37,582- CoreCell -DEBUG- site-1: trying path ['server'] ...2025-02-0321:09:37,582- CoreCell -DEBUG- site-1: noCellAgentfor server2025-02-0321:09:37,582- CoreCell - WARNING - [ME=site-1 O=? D=server F=? T=? CH=task TP=challenge SCH=? STP=? SEQ=?] noconnectionto server2025-02-0321:09:37,583- CoreCell -ERROR- [ME=site-1 O=site-1 D=server F=site-1 T=server CH=task TP=challenge SCH=? STP=? SEQ=?] cannot send to'server': target_unreachable
2025-02-0321:09:37,583- CoreCell -DEBUG- released waiter on REQ c1ded751-b428-4e35-86c3-016b804d1708
2025-02-0321:09:37,583- Communicator - INFO - challenge result: target_unreachable
2025-02-0321:09:37,583- Communicator - INFO -re-challenge after 2.0 seconds
After trying different things, I redid step 2 and this time created a CNAME record from nvflare.mydomain.com to ec2-a-b-c-d.us-east-2.compute.amazonaws.com.
This solved the problem and I can now have nvflare.mydomain.com in my project.yml.
Can someone explain why having the A record did not work in this case?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Python version (
python3 -V
)3.12
NVFlare version (
python3 -m pip list | grep "nvflare"
)2.5.2
NVFlare branch (if running examples, please use the branch that corresponds to the NVFlare version,
git branch
)No response
Operating system
MacOS + Amazon Linux 2023
Have you successfully run any of the following examples?
Please describe your question
I tried to test a real-world scenario using NVFlare where a client runs in my MacOS and the server runs in an AWS EC2 instance. Below are the steps I followed:
a.b.c.d
).nvflare.mydomain.com
) toa.b.c.d
.nc -l 0.0.0.0 8002
on the server andnc nvflare.mydomain.com:8002
on the client, and everything seemed ok.nvflare.mydomain.com
in theproject.yml
, provisioned, transferred files, and started the server and client.After trying different things, I redid step 2 and this time created a CNAME record from
nvflare.mydomain.com
toec2-a-b-c-d.us-east-2.compute.amazonaws.com
.This solved the problem and I can now have
nvflare.mydomain.com
in myproject.yml
.Can someone explain why having the A record did not work in this case?
Beta Was this translation helpful? Give feedback.
All reactions