Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new, safe, documented PrecomputedInfoSpec #881

Merged
merged 4 commits into from
Feb 4, 2025

Conversation

supersergiy
Copy link
Member

@supersergiy supersergiy commented Jan 31, 2025

The previous version of precomputed-based layer creation relied on logic that would replicate the reference info file entirely, giving users various mechanisms to adjust it. This caused problems when the reference info file contained unexpected or new properties, such as new compression methods or sharding. Moreover, this method didn't provide a streamlined way to create an info file without a reference. The logic to apply fixes and modifications to the reference info file grew increasingly complex, hacky, and undocumented, making it difficult for non-experts to create robust specs.

We relied heavily on info inheritance logic because there are many cases where we want to inherit the reference info file almost entirely - specifically, for copy and interpolate operations. However, even in those cases, inheriting the sharding property caused issues.

This refactor aims to provide a way to specify info files that won't encounter problems due to unexpected info key/values while allowing a concise and robust method of performing full-inheritance operations such as copy. The set of info file controls exposed to the user has been reduced to:

        scales: Sequence[Sequence[float]],
        # Reference
        reference_path: str | None = None,
        inherit_all_params: bool = False,
        # Top-level info keys
        type: Literal["image", "segmentation"] | None = None,  
        data_type: PrecomputedVolumeDType | None = None,
        num_channels: int | None = None,
        # Per-scale info keys
        chunk_size: Sequence[int] | None = None,
        encoding: str | None = None,
        bbox: BBox3D | None = None,
        extra_scale_data: dict | None = None,

In the new system, only the scales parameter is required to construct a new info file. In zutils, we only modify the info file of layers being written to, and the scales parameter represents the resolutions that will be written to the layer.

There are two info inheritance modes, controlled by inherit_all_params. When False, reference_path is used only to inherit the dataset bounds. This is used for jobs such as network inference, where the output layer's info has no relation to the input layer's info. All parameters must be specified manually by the user.

When inherit_all_params is True, all info parameters will be inherited from the reference_path, and users may only override the bbox parameter. This is used when the output layer's info should be a copy of the input layer. Note that the sharding key is not inherited from the scales.

With this PR, all parameters have up-to-date documentation in precomputed.py and in build for cloudvolume and tensorstore layers. Clear exceptions are thrown when users supply invalid parameters. The old versions of build_layer functions remain available by specifying "@version": "0.3" in the spec.

Tested with Copy v2.0 and Interpolate v1.3 jobs in Portal.

@supersergiy supersergiy force-pushed the sergiy/precomputed_no_inherit branch 2 times, most recently from 29f6069 to b575873 Compare February 2, 2025 03:06
Copy link

codecov bot commented Feb 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (303511d) to head (7f3b538).
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #881   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files          143       142    -1     
  Lines         6130      6100   -30     
=========================================
- Hits          6130      6100   -30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@supersergiy supersergiy force-pushed the sergiy/precomputed_no_inherit branch 6 times, most recently from 9d6f4af to f682b96 Compare February 3, 2025 04:18
@supersergiy supersergiy force-pushed the sergiy/precomputed_no_inherit branch from f682b96 to 6646edf Compare February 3, 2025 04:20
@supersergiy supersergiy marked this pull request as draft February 3, 2025 05:43
@supersergiy supersergiy marked this pull request as draft February 3, 2025 05:43
@supersergiy supersergiy force-pushed the sergiy/precomputed_no_inherit branch from dad9455 to 7f3b538 Compare February 4, 2025 16:05
@supersergiy supersergiy marked this pull request as ready for review February 4, 2025 16:28
@supersergiy supersergiy merged commit 4bdcb62 into main Feb 4, 2025
10 checks passed
@trivoldus28
Copy link
Contributor

The idea sounds good. But is it not possible to retain or query for existing scales? I wonder if it can be unsafe when you're overwriting info in-place. For example, upsampling [128, 128, 40] to [64, 64, 40]. In this case you can potentially lose existing scales (256, 512, ...)?

@supersergiy
Copy link
Member Author

supersergiy commented Feb 5, 2025

@trivoldus28 forgot to mention this -- there are two more parameters to build_*: [info_overwrite and info_keep_existing_scales]

Existing scales in the created info path will be kept by default, but not inherited from the reference.

@trivoldus28
Copy link
Contributor

Are the scales sorted when you add to existing scales?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants