Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev-22 use same application code for local and swarm training #24

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

oleschwen
Copy link
Collaborator

@oleschwen oleschwen commented Feb 11, 2025

See #22

…version, the latter to be used for local training and as a preflight test
…endencies earlier for better docker cache usage
@oleschwen oleschwen self-assigned this Feb 11, 2025
@oleschwen oleschwen linked an issue Feb 11, 2025 that may be closed by this pull request
2 tasks
@oleschwen
Copy link
Collaborator Author

To test this without pushing images to the docker registry,

  • build the images but don't create and push the jefftud/… tags
  • use the image without jefftud/ prefix when setting up the project via the dashboard
  • use --no-pull for docker.sh from the startup kit

@oleschwen
Copy link
Collaborator Author

oleschwen commented Feb 13, 2025

Summarizing the changes:

  • copy MediSwarm code into Docker image (to have one way of distribution for local training; also used for testing at that location)
  • main.py in 3dcnn_ptlapplication code can be run in swarm and local training mode (refactored in methods in new file)
  • docker.sh in client startup scripts can run those two and the minimum example as an additional pre-flight check
  • documentation in main README.md (maybe to be moved later)

In addition:

  • fixed apt package versions and optimized Dockerfiles for nfcore and 3dcnn containers

@oleschwen oleschwen marked this pull request as ready for review February 13, 2025 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Same 3dcnn_ptl code base for local and swarm training
1 participant