Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop a consistent Context mechanism for the the Prefect workflow #28

Open
5 of 6 tasks
krlberry opened this issue Feb 5, 2025 · 4 comments
Open
5 of 6 tasks
Assignees
Labels

Comments

@krlberry
Copy link
Contributor

krlberry commented Feb 5, 2025

Objective

Develop a consistent Context mechanism for the Prefect workflow to demonstrate the context capability" (The RADPS Product Roadmap objective, Item 01: Workflow Concept Validation , Number 3 )

Requirements

  • Clean-up the existing the workflow to have cohesive Context mechanism.
  • Include examples of where ‘state’ or ‘context’ may need to be saved/restored for checkpointing or scheduling.
  • Show (via dependencies) how intermediate data products (such as calibration tables) must be managed in-between Stages.

Definition of Done

  • Structure of Context is documented.
  • Mechanism of access (load, save, and update) the Context is implemented.
  • Necessary functions to interact Context are defined.
  • Modified the existing Prefect workflow code to adopt a consistent Context mechanism through the 'stages'.
  • Management of intermediate data products (with a fake data) is demonstrated.
  • Update the dict structure of the Context

Key Decision Points

  • Agree on the structure of Context and functions to access the Context that are cohesive through the workflow
@krlberry krlberry changed the title Develop and Demonstrate a context capacity in the Prefect workflow Develop a consistent Context mechanism for the the Prefect workflow Feb 5, 2025
@krlberry krlberry self-assigned this Feb 18, 2025
@krlberry
Copy link
Contributor Author

krlberry commented Feb 18, 2025

A consistent context mechanism was added to all stages of the pipeline. It's currently implemented as a class that can accept dicts to add to the context and will return a dict representation of the current context.

A section was added to the wiki briefly describing the design, structure, and potential future improvements.

These were not substantially addressed in Sprint 2, but could be addressed in the future:

  • Include examples of where ‘state’ or ‘context’ may need to be saved/restored for checkpointing or scheduling.
  • Show (via dependencies) how intermediate data products (such as calibration tables) must be managed in-between Stages.

@amcnicho
Copy link
Member

amcnicho commented Feb 19, 2025

The structure of the context after a full run of pipeline.py as of 8593efc

Current context context = { "data_import_and_prep": { "data": { "J1851+0035": "dask.array", "source_1": "dask.array", "data": { "source_1": "dask.array" }, "data_source": "archive", "url": "https://almascience.nrao.edu/aq/", "J1752-2956": "dask.array", "source_0": "dask.array", }, "caltables": { "gains": "dask.array" }, "caltable": { "gains": "dask.array" }, "qa_scores": {"data_import_and_prep": 0.8502631326462059}, }, "calibrator_data_import_and_prep": { "qa": { "calibrator_data_import_and_prep_J1851+0035": 0.9894343336819162, "calibrator_data_import_and_prep_J1752-2956": 0.8657931974020328, } }, "bandpass": {"qa": {"bandpass_qa_score_J1752-2956": 0.7007337845960031}}, "gaincal": {"qa": {"gaincal_qa_score": 0.6783920189417391}}, "calibrator_imaging": {"qa": {"imaging_qa_score": 0.4569692096863739}}, "findcont": { "data": { "cube": { "data": { "source_1": "dask.array" }, "data_source": "archive", "url": "https://almascience.nrao.edu/aq/", }, "data": { "source_1": "dask.array" }, "data_source": "archive", "url": "https://almascience.nrao.edu/aq/", }, "datashape": { "bcal": {"n_field": 1, "n_spw": 3, "n_scan": 1}, "gcal": {"n_field": 1, "n_spw": 3, "n_scan": 4}, "target": {"n_field": 1, "n_spw": 3, "n_scan": 5}, }, }, "image_cube": { "uvcontsub": { "uvcont_result": [ ["Pass", "Pass", "Pass"], ["Pass", "Pass", "Pass"], ["Pass", "Pass", "Pass"], ["Pass", "Pass", "Pass"], ["Pass", "Pass", "Pass"], ], "datashape": { "target": {"n_field": 1, "n_spw": 3, "n_scan": 5, "n_nchan": 2} }, "qa_result": {"uvcontsub_qa_score": 0.4812112011684333}, }, "image": ["data_0", "data_1", "data_2"], "data": { "bcal": {"n_field": 1, "n_spw": 3, "n_scan": 1}, "gcal": {"n_field": 1, "n_spw": 3, "n_scan": 4}, "target": {"n_field": 1, "n_spw": 3, "n_scan": 5, "n_nchan": 2}, }, "cube_image_qa": { "cubeimage_qa_score": 0.22645717260050102, "cube_image_data": { "image": {"x": 512, "y": 512, "nchan": 1, "npol": 1}, "target": {"n_field": 1, "n_spw": 3, "n_scan": 5, "n_nchan": 2}, }, }, }, "context": "context.pkl", }

In the interest of consistency, let's settle on a structure to use across stages (at least until a more formal spec is developed). Proposal:

context = {
    context : "pickles.zarr",
    stage_name : {
        data : {
        key0 : object0,
        key1 : object1,
        },
        metadata : {
        key0 : object0,
        key1 : object1,
        },
        qa : qa_container,
    },
    another_stage_name : {
        data : data_container,
        metadata : metadata_container,
        qa : qa_container,
    },
    ...
}

Only question I have about how the existing format would map onto this is whether to store datashape alongside the stage:data:key elements (in my mind, the place to store fake vis/image objects) or in a separate place. Maybe we don't need the metadata key at all?

@krlberry
Copy link
Contributor Author

krlberry commented Feb 21, 2025

With the structure we have now, the primary purpose I can see for the metadata section is the datashape. Each datashape could be stored under the appropriatestage:metadata:key. So I'd say that either we keep metadata and place datashape underneath, or remove metadata for now and add it back in if we need it later.

@krlberry
Copy link
Contributor Author

After some discussion, we settled on:

context = {
    context : "pickles.zarr",
    stage_name : {
        data : {
        key0 : object0,
        key1 : object1,
        },
        datashape : {
        key0 : object0,
        key1 : object1,
        },
        qa : qa_container,
    },
    another_stage_name : {
        data : data_container,
        datashape : metadata_container,
        qa : qa_container,
    },
    ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants