DM-47350: Add walltime to resource usage tables #336

eigerx · 2025-01-10T23:19:44Z

No description provided.

yalsayyad

The addition to the ResourceUsageSummary is missing. For example https://github.com/lsst/analysis_tools/blob/main/python/lsst/analysis/tools/tasks/gatherResourceUsage.py#L155

I'm hitting "request changes" only because I do want to take a look at the modifications to ConsolidateResourceUsageTask when you make them.

Follow the same naming pattern (i.e. "walltime_s_")

python/lsst/analysis/tools/tasks/gatherResourceUsage.py

yalsayyad · 2025-01-11T00:24:44Z

python/lsst/analysis/tools/tasks/gatherResourceUsage.py

@@ -284,6 +291,8 @@ class GatherResourceUsageTask(PipelineTask):
    - ``init_time``: the time spent in task construction;
    - ``run_time``: the time spent executing the task's runQuantum
      method.
+    - ``wall_time`` : the total elapsed real time time spent executing the
+      task.


Remove duplicate "time." As written, it's a little ambiguous whether wall_time is comparable to run_time (as in its the elapsed time of spent in runQuantum) or both run_time + init_time.

Does this new description sound good? (highlighted bit above is outdated)

yalsayyad · 2025-01-11T00:29:41Z

python/lsst/analysis/tools/tasks/gatherResourceUsage.py

@@ -456,6 +468,10 @@ def _extract_quantum_timing(self, quantum_metadata):
            if getattr(self.config, attr_name)
        }

+        quantum_timing["wall_time"] = (end_wall_time - start_wall_time).sec


Your Task config parameter wall_time does nothing.

I think what I added

if self.config.wall_time: start_wall_time = Time(quantum_metadata["startUtc"].split("+")[0]) end_wall_time = Time(quantum_metadata["endUtc"].split("+")[0]) quantum_timing["wall_time"] = (end_wall_time - start_wall_time).sec

should make it do something?

tests/test_gatherResourceUsage.py

jchiang87 · 2025-01-13T22:59:24Z

python/lsst/analysis/tools/tasks/gatherResourceUsage.py

@@ -455,6 +471,12 @@ def _extract_quantum_timing(self, quantum_metadata):
            )
            if getattr(self.config, attr_name)
        }
+        if self.config.wall_time:
+            start_wall_time = Time(quantum_metadata["startUtc"].split("+")[0])


Looking at some TaskMetadata for assembleCoadd in DM-48233 runs, I ran into a lot of incomplete quantum_metadata entries. In particular, at least one was missing a startUtc field:

{'prepUtc': '2024-12-20T18:56:08.843913+00:00', 'prepCpuTime': 5.701368117, 'prepUserTime': 4.901877, 'prepSystemTime': 0.799484, 'prepMaxResidentSetSize': 595865600, 'prepMinorPageFaults': 287064, 'prepMajorPageFaults': 0, 'prepBlockInputs': 32, 'prepBlockOutputs': 24, 'prepVoluntaryContextSwitches': 14723, 'prepInvoluntaryContextSwitches': 144, 'endUtc': '2024-12-20T18:56:08.865832+00:00', 'endCpuTime': 5.718414396, 'endUserTime': 4.9187639999999995, 'endSystemTime': 0.7996449999999999, 'endMaxResidentSetSize': 595951616, 'endMinorPageFaults': 287103, 'endMajorPageFaults': 0, 'endBlockInputs': 32, 'endBlockOutputs': 32, 'endVoluntaryContextSwitches': 14748, 'endInvoluntaryContextSwitches': 147}

So I think we need some workarounds here as for the cpu time values if we want to avoid the task failing on those missing entries.

I'm confused as to how it's possible for prep and end to exist but start to be missing...

Looking at the log files, it seems that these jobs don't have any input datasets to include in the coadd, e.g.,

lsst.ctrl.mpexec.singleQuantumExecutor 2024-12-20T18:56:08.852388Z No dataset artifact found for deepCoadd_directWarp@{instrument: 'LSSTComCam', skymap: 'lsst_cells_v1', tract: 10221, patch: 90, visit: 2024112600123, band: 'i', day_obs: 20241126, physical_filter: 'i_06'} [sc=ExposureF] (run=LSSTComCam/runs/DRP/20241101_20241211/w_2024_51/DM-48233/20241220T172328Z id=a27c6f6b-c150-4073-8efb-e5acc5ee49e4) 20 INFO singleQuantumExecutor.py /cvmfs/sw.lsst.eu/linux-x86_64/lsst_distrib/w_2024_51/conda/envs/lsst-scipipe-9.0.0-exact/share/eups/Linux64/ctrl_mpexec/ge4484fe4ba+cdc99b1de5/python/lsst/ctrl/mpexec/singleQuantumExecutor.py

So, I think we can just use prepUtc instead of startUtc.

Just switched to prepUtc in the task and the unit test. @jchiang87, I pulled in your unit test from your branch and it all looks good. Do you want your commit squashed onto the rest? I think this will be good to go pending a successful Jenkins run.

Sure, please go ahead and squash the commit.

eigerx · 2025-01-14T21:28:43Z

I think I addressed everything above. Successful Jenkins run: https://rubin-ci.slac.stanford.edu/blue/organizations/jenkins/stack-os-matrix/detail/stack-os-matrix/3649/pipeline

yalsayyad · 2025-01-15T03:01:45Z

python/lsst/analysis/tools/tasks/gatherResourceUsage.py

@@ -165,7 +175,6 @@ def run(self, **kwargs: Any) -> Struct:
            .sort_values("task"),
            memrun,
        )
-


You don't have to change, but typically we try to minimize the unnecessary white space changes.

yalsayyad · 2025-01-15T03:18:43Z

python/lsst/analysis/tools/tasks/gatherResourceUsage.py

@@ -284,6 +297,8 @@ class GatherResourceUsageTask(PipelineTask):
    - ``init_time``: the time spent in task construction;
    - ``run_time``: the time spent executing the task's runQuantum
      method.
+    - ``wall_time`` : the total elapsed real time spent executing, preparing


"executing, preparing and completing" is still not precise enough for a number we're going to comparing to run_time = endCpuTime - startCpuTime very frequently from now on. Would be great to clarify once and for all here. @timj What's the most precise way you'd describe endUtc - prepUtc? It's elapsed time corresponding to prep_time + init_time + run_time, right? That would makes it elapsed time in the pre-initialization step, in task construction, AND in executing the task's runQuantum method.

@timj did you have an opinion here? For now, I pushed Yusra's text above, but let me know if that seems good to you.

Sorry. Somehow I missed the message from 3 days ago.

prep: triggers as soon as single quantum execution is begun but can include some checks and running updatedQuantumInputs.

start means "just before runQuantum is called" (but won't happen if an error happens earlier)

end: triggers immediately after runQuantum (and can trigger in error condition even if start hasn't)

I've added the distinction above into the docs. Thanks!

eigerx force-pushed the tickets/DM-47350 branch 3 times, most recently from 1fc600f to 283c1cf Compare January 10, 2025 23:36

eigerx requested a review from yalsayyad January 11, 2025 00:10

yalsayyad requested changes Jan 11, 2025

View reviewed changes

eigerx force-pushed the tickets/DM-47350 branch 4 times, most recently from baf7174 to 735b211 Compare January 11, 2025 06:14

jchiang87 reviewed Jan 13, 2025

View reviewed changes

eigerx force-pushed the tickets/DM-47350 branch 4 times, most recently from c969edd to f6572b8 Compare January 14, 2025 20:59

eigerx requested a review from yalsayyad January 14, 2025 21:28

yalsayyad approved these changes Jan 15, 2025

View reviewed changes

eigerx force-pushed the tickets/DM-47350 branch 2 times, most recently from 9a338d3 to f4c391c Compare January 17, 2025 22:13

Add walltime to resource usage tables

8ef5a90

eigerx force-pushed the tickets/DM-47350 branch from f4c391c to 8ef5a90 Compare January 17, 2025 22:15

eigerx merged commit 3711557 into main Jan 17, 2025
8 checks passed

eigerx deleted the tickets/DM-47350 branch January 17, 2025 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-47350: Add walltime to resource usage tables #336

DM-47350: Add walltime to resource usage tables #336

eigerx commented Jan 10, 2025

yalsayyad left a comment •

edited

Loading

yalsayyad Jan 11, 2025

eigerx Jan 11, 2025

yalsayyad Jan 11, 2025

eigerx Jan 11, 2025

jchiang87 Jan 13, 2025

timj Jan 13, 2025

jchiang87 Jan 14, 2025

eigerx Jan 14, 2025

jchiang87 Jan 14, 2025

eigerx commented Jan 14, 2025

yalsayyad Jan 15, 2025

yalsayyad Jan 15, 2025

eigerx Jan 17, 2025

timj Jan 17, 2025

eigerx Jan 17, 2025 •

edited

Loading

DM-47350: Add walltime to resource usage tables #336

DM-47350: Add walltime to resource usage tables #336

Conversation

eigerx commented Jan 10, 2025

yalsayyad left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eigerx commented Jan 14, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eigerx Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

yalsayyad left a comment •

edited

Loading

eigerx Jan 17, 2025 •

edited

Loading