Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add leader fallback for worker file imports #5189

Merged
merged 5 commits into from
Jan 16, 2025

Conversation

stxue1
Copy link
Contributor

@stxue1 stxue1 commented Dec 18, 2024

Resolves #5135

Changelog Entry

To be copied to the draft changelog by merger:

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passed tests, including the Gitlab tests, for the most recent commit in its branch.
  • Make sure the PR has been reviewed. If not, review it. If it has been reviewed and any requested changes seem to have been addressed, proceed.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good, but I think one of the new functions is missing a docstring, and I'm not sure about the clarity of some of the variable/function names.

src/toil/cwl/cwltoil.py Outdated Show resolved Hide resolved
)

# files with a associated filesize that are valid to be imported on workers
valid_files_to_data = dict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really mapping to file data (bits in the file), right? This is metadata consisting of file size and... something.

Maybe this should be renamed and have a comment explaining what's actually in the values? Or an inline type hint and not just auto-typing?

src/toil/cwl/cwltoil.py Show resolved Hide resolved
Comment on lines 2968 to 2977
files_to_data = get_file_sizes(
filenames, toil._jobStore, include_remote_files=options.reference_inputs
)

# Mapping of files to metadata for files that will be imported on the worker
# This will consist of files that we were able to get a file size for
worker_files_to_data: dict[str, FileMetadata] = dict()
# Mapping of files to metadata for files that will be imported on the leader
# This will consist of files that we were not able to get a file size for
leader_files_to_data = dict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think we want to name these variables _to_data. They don't contain file data.

We could maybe change files_to_data, leader_files_to_data, worker_files_to_data to instead be metadata, leader_metadata, worker_metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think metadata makes sense

@adamnovak adamnovak merged commit 2307445 into master Jan 16, 2025
3 checks passed
@adamnovak adamnovak deleted the issues/5135-leader-fallback-import branch January 16, 2025 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add leader import fallback to running imports on workers
2 participants