-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a metric label template mechanism #137
Draft
dbutenhof
wants to merge
19
commits into
cloud-bulldozer:main
Choose a base branch
from
dbutenhof:title
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest changes look good.
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This PR was closed because it has been stalled for 6 days with no activity. |
This encapsulates substantial logic to encapsulate interpretation of the Crucible Common Data Model OpenSearch schema for the use of CPT dashboard API components. By itself, it does nothing.
This uses `black`, `isort` and `flake8` to check code quality, although failure is ignored until we've cleaned it up (which has begin in PR cloud-bulldozer#139 against the `revamp` branch). Minimal unit testing is introduced, generating a code coverage report. The text summary is added to the Action summary page, and the more detailed HTML report is stored as an artifact for download. NOTE: The GitHub Action environment is unhappy with `uvicorn` 0.15; upgrading to the latest 0.32.x seems to work and hasn't obviously broken anything else.
`crucible_svc.py` test coverage is now at 97%. While the remaining 3% is worth some effort later, the law of diminishing returns will require A significant additional effort; and since subsequent ILAB PRs will change some of the service code anyway it's good enough for now.
Provide the `api/v1/ilab` API endpoint to allow a client to query collected data on a Crucible CDM OpenSearch instance through the `crucible_svc` service layer. It is backed by the Crucible layer added in cloud-bulldozer#122, so only the final commit represents changes in this PR.
This covers 100% of the ilab.py API module using `FastAPI`'s `TestClient`. This proved ... interesting ... as the FastAPI and Starlette versions we use are incompatible with the underlying httpx version ... TestClient init fails in a way that can't be worked around. (Starlette passes an unknown keyword parameter.) After some experimentation, I ended up "unlocking" all the API-related packages in `project.toml` to `"*"` and letting `poetry update` resolve them, then "re-locked" them to those versions. The resulting combination of modules works for unit testing, and appears to work in a real `./local-compose.sh` deployment as well.
This adds a mechanism to "can" and restore a small prototype ILAB (Crucible CDM) Opensearch database in a pod along with the dashboard back end, front end, and functional tests. The functional tests run entirely within the pod, with no exposed ports and with unique container and pod names, allowing for the possibility of simultaneous runs (e.g., a CI) on the same system. This also has utilities for diagnosing a CDM (v7) datastore and cloning a limited subset, along with creating an Opensearch snapshot from that data to bootstrap the functional test pod. Only a few functional test cases are implemented here, as demonstration. More will be added separately.
This relies on the ilab API in cloud-bulldozer#123, which in turn builds on the crucible service in cloud-bulldozer#122.
The `fetchILabJobs` action wasn't updating the date picker values from the API response unless a non-empty list of jobs is returned. This means that on the initial load, if the default API date range (1 month) doesn't find any jobs, the displayed list is empty and the date range isn't updated to tell the user what we've done. I've seen no ill effects in local testing from simply removing the length check, and now the date picker is updated correctly.
When graphing metrics from two runs, the timestamps rarely align; so we add a `relative` option to convert the absolute metric timestamps into relative delta seconds from each run's start.
This adds the basic UI to support comparison of the metrics of two InstructLab runs. This compares only the primary metrics of the two runs, in a relative timeline graph. This is backed by cloud-bulldozer#125, which is backed by cloud-bulldozer#124, which is backed by cloud-bulldozer#123, which is backed by cloud-bulldozer#122. These represent a series of steps towards a complete InstructLab UI and API, and will be reviewed and merged from cloud-bulldozer#122 forward.
This PR is primarily CPT dashboard backend API (and Crucible service) changes to support pulling and displaying multiple Crucible metric statistics. Only minor UI changes are included to support API changes. The remaining UI changes to pull and display statistics will be pushed separately.
Add statistics charts for selected metric in row expansion and comparison views.
Extract the "Metadata" into a separate component, which allows it to be reused as an info flyover on the comparison page to help in identifying target runs to be compared.
Modify the metrics pulldown to allow multiple selection. The statistical summary chart and graph will show all selected metrics in addition to the inherent benchmark primary benchmark (for the primary period).
Support selection of multiple metrics using the pulldown in the comparison page. The update occurs when the pulldown closes. To simplify the management of "available metrics" across multiple selected runs, which might have entirely different metrics, the reducer no longer tries to store separate metric selection lists for each run. This also means that the "default" metrics selection remains when adding another comparison run, or expanding another row.
The Plotly graphing package doesn't directly support a "delta time" type, and in the comparison view we want to use delta time to compare two runs that will generally have different absolute timestamps. (It turns out that the native PatternFly graphing package, Victory, has the same limitation.) Initially, this just reported numeric delta seconds, but that's unnatural for a reader. This PR adds support for a `absolute_relative` option which reports the delta times as small absolute timestamps, like `1970-01-01 00:01:00` for 60 seconds, formatting ticks using `"%H:%M:%S"` ("00:01:00") for readability. I also made the X axis title appear, which necessitated some refactoring of the layout to avoid overlaying the legend on the axis label; and in the process I moved the "presentation specific" width parameter into the UI and the others into the API so they don't have to be duplicated in the two action calls. This requires a minor change in unit tests, but avoids doing more than necessary for success. (Coverage has dropped to 94%, but at this point I choose to avoid more extensive churn than required, and 94% remains "acceptable".)
This fixes two issues with the ILAB tab, improving table scroll behavior and exposing more URL query parameters to retain pagination across reload.
The UI is currently hardcoded to support two environments: a development deployment from `localhost:3000` to `localhost:8000`, and the API explicitly allows `http://localhost:3000` as a cross-site origin; and an OpenShift deployment where cross-site scripting is unnecessary as the cluster API reverse proxy hides the port numbers. Partly for more general testing and deployment, but specifically because the RHEL AI InstructLab project requires CPT dashboard access now before our code has been integrated into the production OpenShift deployment, it's convenient to support a third "bare metal" mode where the containerized UI and backend are hosted at ports 3000 and 8000 on some host (e.g., in the RDU3 Performance Lab). For this, the UI needs to recognize that a non-`localhost` `window.location` with a `3000` port needs to call the API at port `8000` on the same host (for our "bare metal" deployment) while an empty port indicates we're using the OpenShift API reverse proxy routing. Similarly, the backend code cross-site scripting protection needs to allow port 3000 from the same host as a valid origin.
The API makes an effort to create unique metric labels when comparing data across multiple runs, but without human direction the constructed names aren't necessarily ideal to help a human focus on the comparisons they want to make. This PR adds a UI template mechanism to allow the human to format names that are more useful -- for example, to focus on the differences between software releases or hardware configurations expressed by Crucible tags or params. For example, a template of `<metric> <iteration>:<period> <tag:accelerator>` identifies metrics between two runs based on the hardware accelerator type used for each. This introduces a new action to fetch and store the available filters, which exposes the param and tag values currently in use. Usability of the template mechanism will depend on consistent application of specific tags, which we expect to be increasingly supplied automatically by Crucible discovery rather than relying on the (current) ad hoc user definition in endpoint files.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Type of change
Description
The API makes an effort to create unique metric labels when comparing data across multiple runs, but without human direction the constructed names aren't necessarily ideal to help a human focus on the comparisons they want to make.
This PR adds a UI template mechanism to allow the human to format names that are more useful: for example, to focus on the differences between software releases or hardware configurations expressed by Crucible tags or params.
For example, a template of
<metric> <iteration>:<period> <tag:accelerator>
identifies metrics between two runs based on the hardware accelerator type used for each, with labels likeilab::actual-train-seconds 2:measurement L40S 1.2
.This introduces a new action to fetch and store the available filters, which exposes the param and tag values currently in use. Usability of the template mechanism will depend on consistent application of specific tags, which we expect to be increasingly supplied automatically by Crucible discovery rather than relying on the (current) ad hoc user definition in endpoint files.
This is chained from #122 (Crucible service) -> #140 (unit test framework) -> #146 (crucible unit tests) -> #123 (ilab API) -> #155 (API unit tests) -> #158 (functional test framework) -> #124 (ilab UI) -> #153 (date picker) -> #125 (multi-run graphing API) -> #127 (multi-run graphing UI) -> #129 (statistics aggregation) -> #131 (metadata flyover) -> #132 (multiple metrics selection) -> #133 (compare multiple metrics) -> #134 (improve delta time x axis labels) -> #136 (bare metal CORS) -> #137 (custom titles)
Related Tickets & Documents
PANDA-659 Create custom metric titles
Checklist before requesting a review
Testing
Tested on a local deployment.