Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.28.0.12]:14004 #3

Open
zhuyingce opened this issue Aug 13, 2023 · 0 comments

Comments

@zhuyingce
Copy link

Thanks for your paper,firstly.The pFedSD is a great case for FKD.When I run your code for pFedSD, it always show erros about process communication such as "Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/content/pFedSD/run_gloo.py", line 82, in main
process.run()
File "/content/pFedSD/pcode/workers/worker_pFedSD.py", line 47, in run
self._send_model_to_master()
File "/content/pFedSD/pcode/workers/worker_base.py", line 304, in _send_model_to_master
dist.send(tensor=flatten_model.buffer, dst=0)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 1295, in send
default_pg.send([tensor], dst, tag).wait()
RuntimeError: [../third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [172.28.0.12]:43185".
I will appreciate it if you can give me some tips about this error. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant