Develop a consistent Context mechanism for the the Prefect workflow #28

krlberry · 2025-02-05T18:43:44Z

Objective

Develop a consistent Context mechanism for the Prefect workflow to demonstrate the context capability" (The RADPS Product Roadmap objective, Item 01: Workflow Concept Validation , Number 3 )

Requirements

Clean-up the existing the workflow to have cohesive Context mechanism.
Include examples of where ‘state’ or ‘context’ may need to be saved/restored for checkpointing or scheduling.
Show (via dependencies) how intermediate data products (such as calibration tables) must be managed in-between Stages.

Definition of Done

Structure of Context is documented.
Mechanism of access (load, save, and update) the Context is implemented.
Necessary functions to interact Context are defined.
Modified the existing Prefect workflow code to adopt a consistent Context mechanism through the 'stages'.
Management of intermediate data products (with a fake data) is demonstrated.
Update the dict structure of the Context

Key Decision Points

Agree on the structure of Context and functions to access the Context that are cohesive through the workflow

krlberry · 2025-02-18T18:48:02Z

A consistent context mechanism was added to all stages of the pipeline. It's currently implemented as a class that can accept dicts to add to the context and will return a dict representation of the current context.

A section was added to the wiki briefly describing the design, structure, and potential future improvements.

These were not substantially addressed in Sprint 2, but could be addressed in the future:

Include examples of where ‘state’ or ‘context’ may need to be saved/restored for checkpointing or scheduling.
Show (via dependencies) how intermediate data products (such as calibration tables) must be managed in-between Stages.

amcnicho · 2025-02-19T20:41:29Z

The structure of the context after a full run of pipeline.py as of 8593efc

Current context


 context = {
    "data_import_and_prep": {
        "data": {
            "J1851+0035": "dask.array",
            "source_1": "dask.array",
            "data": {
                "source_1": "dask.array"
            },
            "data_source": "archive",
            "url": "https://almascience.nrao.edu/aq/",
            "J1752-2956": "dask.array",
            "source_0": "dask.array",
        },
        "caltables": {
            "gains": "dask.array"
        },
        "caltable": {
            "gains": "dask.array"
        },
        "qa_scores": {"data_import_and_prep": 0.8502631326462059},
    },
    "calibrator_data_import_and_prep": {
        "qa": {
            "calibrator_data_import_and_prep_J1851+0035": 0.9894343336819162,
            "calibrator_data_import_and_prep_J1752-2956": 0.8657931974020328,
        }
    },
    "bandpass": {"qa": {"bandpass_qa_score_J1752-2956": 0.7007337845960031}},
    "gaincal": {"qa": {"gaincal_qa_score": 0.6783920189417391}},
    "calibrator_imaging": {"qa": {"imaging_qa_score": 0.4569692096863739}},
    "findcont": {
        "data": {
            "cube": {
                "data": {
                    "source_1": "dask.array"
                },
                "data_source": "archive",
                "url": "https://almascience.nrao.edu/aq/",
            },
            "data": {
                "source_1": "dask.array"
            },
            "data_source": "archive",
            "url": "https://almascience.nrao.edu/aq/",
        },
        "datashape": {
            "bcal": {"n_field": 1, "n_spw": 3, "n_scan": 1},
            "gcal": {"n_field": 1, "n_spw": 3, "n_scan": 4},
            "target": {"n_field": 1, "n_spw": 3, "n_scan": 5},
        },
    },
    "image_cube": {
        "uvcontsub": {
            "uvcont_result": [
                ["Pass", "Pass", "Pass"],
                ["Pass", "Pass", "Pass"],
                ["Pass", "Pass", "Pass"],
                ["Pass", "Pass", "Pass"],
                ["Pass", "Pass", "Pass"],
            ],
            "datashape": {
                "target": {"n_field": 1, "n_spw": 3, "n_scan": 5, "n_nchan": 2}
            },
            "qa_result": {"uvcontsub_qa_score": 0.4812112011684333},
        },
        "image": ["data_0", "data_1", "data_2"],
        "data": {
            "bcal": {"n_field": 1, "n_spw": 3, "n_scan": 1},
            "gcal": {"n_field": 1, "n_spw": 3, "n_scan": 4},
            "target": {"n_field": 1, "n_spw": 3, "n_scan": 5, "n_nchan": 2},
        },
        "cube_image_qa": {
            "cubeimage_qa_score": 0.22645717260050102,
            "cube_image_data": {
                "image": {"x": 512, "y": 512, "nchan": 1, "npol": 1},
                "target": {"n_field": 1, "n_spw": 3, "n_scan": 5, "n_nchan": 2},
            },
        },
    },
    "context": "context.pkl",
}

In the interest of consistency, let's settle on a structure to use across stages (at least until a more formal spec is developed). Proposal:

context = {
    context : "pickles.zarr",
    stage_name : {
        data : {
        key0 : object0,
        key1 : object1,
        },
        metadata : {
        key0 : object0,
        key1 : object1,
        },
        qa : qa_container,
    },
    another_stage_name : {
        data : data_container,
        metadata : metadata_container,
        qa : qa_container,
    },
    ...
}

Only question I have about how the existing format would map onto this is whether to store datashape alongside the stage:data:key elements (in my mind, the place to store fake vis/image objects) or in a separate place. Maybe we don't need the metadata key at all?

krlberry · 2025-02-21T20:27:52Z

With the structure we have now, the primary purpose I can see for the metadata section is the datashape. Each datashape could be stored under the appropriatestage:metadata:key. So I'd say that either we keep metadata and place datashape underneath, or remove metadata for now and add it back in if we need it later.

krlberry · 2025-02-24T14:39:53Z

After some discussion, we settled on:

context = {
    context : "pickles.zarr",
    stage_name : {
        data : {
        key0 : object0,
        key1 : object1,
        },
        datashape : {
        key0 : object0,
        key1 : object1,
        },
        qa : qa_container,
    },
    another_stage_name : {
        data : data_container,
        datashape : metadata_container,
        qa : qa_container,
    },
    ...
}

krlberry added the Workflow label Feb 5, 2025

krlberry changed the title ~~Develop and Demonstrate a context capacity in the Prefect workflow~~ Develop a consistent Context mechanism for the the Prefect workflow Feb 5, 2025

amcnicho mentioned this issue Feb 11, 2025

Improve design consistency in the Prefect Workflow #31

Open

14 tasks

krlberry self-assigned this Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop a consistent Context mechanism for the the Prefect workflow #28

Develop a consistent Context mechanism for the the Prefect workflow #28

krlberry commented Feb 5, 2025 •

edited

Loading

krlberry commented Feb 18, 2025 •

edited

Loading

amcnicho commented Feb 19, 2025 •

edited

Loading

krlberry commented Feb 21, 2025 •

edited

Loading

krlberry commented Feb 24, 2025

Develop a consistent Context mechanism for the the Prefect workflow #28

Develop a consistent Context mechanism for the the Prefect workflow #28

Comments

krlberry commented Feb 5, 2025 • edited Loading

Objective

Requirements

Definition of Done

Key Decision Points

krlberry commented Feb 18, 2025 • edited Loading

amcnicho commented Feb 19, 2025 • edited Loading

krlberry commented Feb 21, 2025 • edited Loading

krlberry commented Feb 24, 2025

krlberry commented Feb 5, 2025 •

edited

Loading

krlberry commented Feb 18, 2025 •

edited

Loading

amcnicho commented Feb 19, 2025 •

edited

Loading

krlberry commented Feb 21, 2025 •

edited

Loading