Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master-side online analyses for scans #389

Open
dnadlinger opened this issue Apr 10, 2024 · 8 comments
Open

Master-side online analyses for scans #389

dnadlinger opened this issue Apr 10, 2024 · 8 comments
Labels
A-applet Involves the ndscan plot applets A-experiment Involves the experiment-side code C-new-feature This is a request for major new functionality

Comments

@dnadlinger
Copy link
Member

dnadlinger commented Apr 10, 2024

As it says in the title: It would be useful to be able to run online analyses (i.e. while the scan is running) also master-side with custom code rather than just the pre-defined fits in the applets.

This came out of a discussion on how to show randomised benchmarking (RBM) data as it comes in. ABaQuS (one of the Oxford experiments) currently have a local patch that (if I understand correctly) allows running an arbitrary callback after points have been completed, which they use to directly update some datasets to show a preliminary fit while the scan is still in progress. It would obviously be nicer to do this is in a way better integrated with the rest of ndscan. But of course, we already have the concept of online analysis execution in ndscan, so let's just extend this to be able to run with custom code master-side as well!

People (such as @hartytp) have been wanting custom online analyses for a while anyway. The OI implementation appears to allow the experiment to just specify a Python class to run on the client, with the actual implementation still being executed on the client. This works, but has the disadvantage of having to keep the code in sync. (They seem to be using this only to plug in the ionics_fits library, which is fairly static, presumably somewhat circumventing this issue.) just serialises the entire code and ships that to the applets. Our approach would be an alternative to that.

In summary:

  • The basic idea is that the (sub)scan runner gains support to execute the online part of default analyses during the scan.
  • This is done on the master, presumably as a background process to avoid slowing down experimental progress (similar to how applets currently execute online fits as new points come in).
    • The subscan can re-use the same analysis for both "live" and "default analysis" phases. This could be automatically handled by ndscan as an option.
  • Both for top-level scans and for subscans that have the "preview" enabled, the respective scan runner loop updates the analysis_results datasets (and plot annotations) as new points keep coming in.
    • This doesn't require any concept of "pushing point previews" to result channels, as the mutability only happens at the top level analysis_results (and as such doesn't run up against the ndscan "functional" execution model, or doesn't have any problems with jagged arrays for incomplete subscan points, etc.).
  • We'll need to make sure the applet handles changing analysis results.
    • Currently, the code might make the assumption that non-online analysis results only arrive once at the end – check!

At this point, we should be able to e.g. implement the Lorentzian online fit to a fluorescence scan as a custom online analysis rather than client side OnlineFit without much of a visible difference.


@JammyL This is part 2 of the RBM live analysis discussion we had on 2023-05-05 (part 1 being off-to-the-side previews for subscans).

@dnadlinger dnadlinger added C-new-feature This is a request for major new functionality A-applet Involves the ndscan plot applets A-experiment Involves the experiment-side code labels Apr 10, 2024
@hartytp
Copy link
Contributor

hartytp commented Apr 10, 2024

People (such as @hartytp) have been wanting custom online analyses for a while anyway. The OI implementation appears to allow the experiment to just specify a Python class to run on the client, with the actual implementation still being executed on the client. This works, but has the disadvantage of having to keep the code in sync. (They seem to be using this only to plug in the ionics_fits library, which is fairly static, presumably somewhat circumventing this issue.) Our approach would be an alternative to that.

Not quite. The implementation I went for base64-encodes a pickle of the analysis class and sends that to the client. This means, for example, that one can define an arbitrary analysis class in the experiment repo and have that work with the client.

@dnadlinger
Copy link
Member Author

Ah, thanks for the correction – I wrote this almost a year ago, clearly I didn't check ionics-fits too carefully back then!

@hartytp
Copy link
Contributor

hartytp commented Apr 10, 2024

Tangentially related: one thing that comes up from time to time is wanting to do a bit of simple post analysis on datasets and use that as a result channel. For example, say I want to calculate an nbar from a red and blue sideband height and then plot / fit that for a heating rate scan.

Right now that's a little cumbersome to implement (requires an rpc with a bit of self.channel.sink.get_last()). I wonder if there is a nice way of making this kind of thing easier.

@dnadlinger
Copy link
Member Author

For that, I think the approach we discussed in #273 (comment) is still the way to go.

@taw181
Copy link

taw181 commented Dec 12, 2024

Hi @dnadlinger , just wondering if this is being worked on at all? I was looking at implementing on-the-fly analysis at Imperial and came across these threads, and don't want to duplicate any effort.

@hartytp
Copy link
Contributor

hartytp commented Dec 13, 2024

Seconded: the next thing on my list is to submit a PR to allow users to add custom annotation items to plots to allow us to do more sophisticated fitting than is currently possible (also impacts #426). If this is going to happen soon then that approach likely doesn't make sense. I'd be happy to contribute to work on this, but I'm not sure I know enough about what you have in mind @dnadlinger to be able to put together a useful PR.

@dnadlinger
Copy link
Member Author

Hi @taw181, thanks for your interest. I don't think anybody here has really worked on an implementation so far, but design-wise I am quite happy that what is proposed here is the way to go. Just to make sure we're on the same page: by "on-the-fly analysis", do you mean specifically a way to repeatedly run a piece of analysis code while a scan is still in progress? In that case, it would be very helpful if you could contirbute an implementation of a design as roughly laid out above.


@hartytp: Regarding custom annotation items, are you specifically referring to your custom applet-side online fitting extensions? For regular analyses, it is already possible to return a list of whatever custom annotations are desired.

As to what I have in mind, it is pretty much described in the first post (though admittedly a bit vaguely). The experiment-side scan runner runs the "online" components of any analyses matching the scan as points are coming in. To avoid unnecessary slowdowns to the data acquisition portion, this is done on a background process (like online fits in the applet). As usual, the analysis can push to the analysis_results result channels it declares. Once the background analysis run finishes, those new values override the results from the previous run (if any), are broadcast, and the applet updates whatever annotations accordingly.

@hartytp
Copy link
Contributor

hartytp commented Dec 22, 2024

There are a few things I don't quite understand, so let's flesh this out a bit and see where we get to.

My assumption is that we modify the ScanRunner to do something like this

result_batcher.ensure_complete_and_push()
for (sink, value) in zip(self._axis_sinks, axis_values):
    # Now that we know self._fragment successfully produced a
    # complete point, also record the axis coordinates.
    sink.push(value)
self._execute_live_analyses()

Where _execute_live_analyses is a new async RPC method we introduce to the ScanRunner.

This RPC needs to be aware of all the analyses, so presumably we move all the filter_default_analyses and handling of analysis result channels / sinks out of the TopLevelRunner and into the ScanRunner.

To be able to execute the analyses, the ScanRunner also needs to be aware of the coordinate and value data. So it needs access to the scan spec and also the coordinate sinks. So maybe we should just move the result sink handling into the scan runner as well?

Then I think the basic RPC would look like

    @rpc{flags: "async"}
    def _execute_live_analyses(self):
        live_analyses = [
            analysis for analysis in self.analyses if isinstance(analysis, LiveAnalysis)
        ]

        annotations = []
        axis_data = self._make_coordinate_dict()
        result_data = self._make_value_dict()

        for analysis in self.live_analyses:
            annotations += analysis.execute(
                axis_data=axis_data,
                result_data=result_data,
                context=self.annotation_context,
            )

        self.set_dataset(
            self.dataset_prefix + "annotations",
            dump_json(annotations),
            broadcast=True
        )

At that point it feels like it also makes sense to move analyse from TopLevelRunner to ScanRunner.

If we make those changes, we then need to think about what we do for continuous scans, which don't use a scan runner at all. For example, make a ContinuousScanRunner to handle these, which might be nicer if it works out since we then have a more unified interface which will make it easier to understand data flow in ndscan rather than "sometimes the scan runner runs the fragment, but other times it's a totally separate code path which does something similar".

Is that roughly what you had in mind?

To avoid unnecessary slowdowns to the data acquisition portion, this is done on a background process (like online fits in the applet).

I have a few questions / comments about how the implementation here works:

  1. To pass the analysis objects into a separate process we'll have to serialize them in some form or other. Presumably something like a base-64 encoded pickle?
  2. What do we want to do if we get new data coming in while old data is still being analysed? In the applet this is handled relatively cleanly because we control an async event loop so we can await the analysis process in one task and set a "new data ready" event in another task. This gives us a relatively clean way of handling incoming data which isn't prone to race conditions. It's not clear to me how one would make that work in the context of async RPCs.
  3. Not a big deal, but worth noting that we'd need to think a little about how many points we generate in our analysis. The applet figures out the right amount of scan points to generate based on the size of the scan window, however here we split the data generation away from the plotting so the user has to make a somewhat arbitrary decision (balancing getting smooth plots v not generating too much data).

Taking a step back though...it's not obvious to me that this is really the right design. If we need to serialize the analysis objects anyway and execute them in a separate process what is the benefit of doing this on the master as opposed to just doing in the applets, which create a more natural vehicle for this kind of thing?

@hartytp: Regarding custom annotation items, are you specifically referring to your custom applet-side online fitting extensions? For regular analyses, it is already possible to return a list of whatever custom annotations are desired.

That's true, but the applet can't render them. What I'm proposing as an alternative design is something like this:

def dumps(obj: Any) -> str:
    try:
        return base64.b64encode(dill.dumps(obj)).decode()
    except dill.PicklingError:
        raise RuntimeError(
            "Cannot pickle model. Model defined within the ARTIQ experiment file?"
        )

# CustomAnalysis already has a meaning!
class UserDefinedAnalysis(DefaultAnalysis):

    # applet-side class used execute the analysis
    # need to tell the applet which class to use for plotting this analysis
    appled_analysis_class: Type[UserDefinedAnalysisItem]
    def describe_online_analyses(
        self, context: AnnotationContext
    ) -> tuple[list[dict[str, Any]], dict[str, dict[str, Any]]]:

    if not hasattr(self, "appled_analysis_class"):
        raise ValueError("UserDefinedAnalysis must define an analysis class")

    return [], {"kind": "user_defined", "class": dumps(self.appled_analysis_class)}

class UserDefinedAnalysisApplet

and something similar for annotation items.

Then we modify the ScanModel to support these:

        for name, schema in analysis_schemata.items():
            kind = schema["kind"]
            if kind == "named_fit":
                self._online_analyses[name] = OnlineNamedFitAnalysis(schema, self)
            elif kind == "user_defined":
                self._online_analyses[name] = loads(schema["class"]))(schema=schema, scan_model=self)
            else:
                logger.warning("Ignoring unsupported online analysis type: '%s'", kind)

Ultimately these two approaches accomplish something pretty similar. I think it's really a question of whether we want to execute custom analysis processes in the applet or find a way of making it work with artiq. The applet-based approach feels more natural to me, but maybe I'm missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-applet Involves the ndscan plot applets A-experiment Involves the experiment-side code C-new-feature This is a request for major new functionality
Projects
None yet
Development

No branches or pull requests

3 participants