Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get training via VPN to work #20

Merged
merged 45 commits into from
Jan 31, 2025
Merged
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
87bf81f
timeout and fixes to 3dcnn, not final version yet
Ultimate-Storm Oct 24, 2024
9b72e97
set MAX_EPOCHS for other site names
oleschwen Nov 12, 2024
8582657
removed trailing whitespace
oleschwen Nov 21, 2024
250d86d
Client startup script
oleschwen Nov 22, 2024
269b722
Server startup script
oleschwen Nov 22, 2024
ff97f80
copy master_template.yml to location in image where it will be used
oleschwen Nov 22, 2024
b049986
cleaned up commented code and comment
oleschwen Nov 22, 2024
873f0aa
more automated way of starting dashboard with fewer (unnecessary?) mo…
oleschwen Nov 22, 2024
0c7b96c
extended instructions for training setup and added todos
oleschwen Nov 22, 2024
9e7a0f4
use same timeout values as 3dcnn in branch fix_app_3dcnn_configs
oleschwen Nov 27, 2024
227ef8c
Merge branch 'fix_app_3dcnn_configs' into dev-19-fix-training-via-vpn
oleschwen Nov 27, 2024
0d49f91
trying to make tasks more similar in terms of num_rounds and model size
oleschwen Nov 28, 2024
232fe9f
experimentally set several timeouts to 10 hours
oleschwen Nov 29, 2024
12e27ec
run 30 rounds, timeouts experimentally set to 10 hours
oleschwen Dec 2, 2024
4ff2706
added default values and values from main branch
oleschwen Dec 2, 2024
77a3b15
timeouts set as in main branch to systematically try server timeouts
oleschwen Dec 4, 2024
354a308
trying to set single parameter start_task_timeout to 10h
oleschwen Dec 4, 2024
1c47e25
trying to set single parameter progress_timeout to 10h
oleschwen Dec 4, 2024
2768b82
trying to set single parameter end_workflow_timeout to 10h
oleschwen Dec 4, 2024
7c9946a
trying to set single parameter configure_task_timeout to 10h
oleschwen Dec 4, 2024
4735ca2
trying to set all server timeouts to 10h
oleschwen Dec 4, 2024
58c1f87
try client parameter learn_task_abort_timeout
oleschwen Dec 4, 2024
4cb717e
try client parameter learn_task_ack_timeout
oleschwen Dec 4, 2024
6068e6d
used timeout parameters from branch fix_app_3dcnn_configs and learn_t…
oleschwen Dec 5, 2024
77f37ac
split FixedSizeCNNForTesting from MiniCNNForTesting
oleschwen Dec 5, 2024
7298f1a
renamed file and cleaned up init
oleschwen Dec 5, 2024
2a82c7f
timeouts back to 10 hours, also max_status_report_interval
oleschwen Dec 16, 2024
2de6338
use same parameters for minimal_training_pytorch_cnn as for 3d_cnn
oleschwen Dec 19, 2024
d7340f4
increased timeouts even more to test DUKE task via slow VPN
oleschwen Dec 20, 2024
f4e822c
use model of size comparable to DUKE example
oleschwen Jan 9, 2025
41bc749
Revert "use model of size comparable to DUKE example"
oleschwen Jan 9, 2025
10862b4
controller not necessary in nfcore image
oleschwen Jan 9, 2025
4b49d08
remove NVFlare installation source directory from Docker images to av…
oleschwen Jan 9, 2025
4e027b0
modify timeouts in nvflare code (part of which are not obviously conf…
oleschwen Jan 9, 2025
9f33378
removed already commented unnecessary copy
oleschwen Jan 10, 2025
cea6588
need write permissions for temporary experiments
oleschwen Jan 10, 2025
5bc6a66
made NVFlare source code modifications directly in the fork
oleschwen Jan 13, 2025
dc5758c
install fixed versions before non-fixed versions to improve Docker ca…
oleschwen Jan 14, 2025
12689da
ensure directory is readable
oleschwen Jan 14, 2025
e1b2921
modified controller imports so that the installed version can be used
oleschwen Jan 15, 2025
b2c9c94
remove source of controller after installation, use the installed ver…
oleschwen Jan 15, 2025
2e2b2b6
create copy of controller code needed for coverage analysis
oleschwen Jan 15, 2025
3a8e0d7
Merge branch 'main' into dev-19-fix-training-via-vpn
oleschwen Jan 31, 2025
7115122
for next experiment, hard-coded numbers of epochs per round for inten…
oleschwen Jan 31, 2025
5ea4f91
updated submodule to merged corresponding branch
oleschwen Jan 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
controller not necessary in nfcore image
oleschwen committed Jan 9, 2025
commit 10862b4b9992b3965e14d0a3e60ba3b00a8a274f
4 changes: 0 additions & 4 deletions docker_config/Dockerfile_nfcore
Original file line number Diff line number Diff line change
@@ -38,10 +38,6 @@ COPY ./docker_config/master_template.yml /workspace/nvflare/nvflare/lighter/impl
# Install NVFlare from the local source
RUN python -m pip install /workspace/nvflare

COPY ./controller /workspace/controller
# Set python path
ENV PYTHONPATH=/workspace/controller/controller

# Set the Docker image name
LABEL name="nvflare-pt-dev:nfcore"