Skip to content

Commit

Permalink
Merge branch 'feature_branch/remove-experiment-tracking' of github.co…
Browse files Browse the repository at this point in the history
…m:kedro-org/kedro-viz into feature_branch/remove-experiment-tracking
  • Loading branch information
Huong Nguyen committed Jan 15, 2025
2 parents 006e861 + d08dd12 commit e864d76
Show file tree
Hide file tree
Showing 7 changed files with 3 additions and 427 deletions.
Binary file modified .github/img/frontend-architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 2 additions & 18 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,18 +65,14 @@ The `localStorage` state is updated automatically on every Redux store update, v

![Kedro-Viz data flow diagram](/.github/img/frontend-architecture.png)

Kedro-Viz currently utilizes two different methods of data ingestion: the Redux setup for the pipeline and flowchart-view related components, and GraphQL via Apollo Client for the experiment tracking components.
Kedro-Viz currently utilizes one method of data ingestion: the Redux setup for the pipeline and flowchart-view related components.

On initialisation for the Redux setup, Kedro-Viz [manually normalises pipeline data](/src/store/normalize-data.js), in order to [make immutable state updates as performant as possible](https://redux.js.org/recipes/structuring-reducers/normalizing-state-shape).

Next, it [initialises the Redux data store](https://github.com/kedro-org/kedro-viz/blob/main/src/store/initial-state.js), by merging this normalised state with other data sources such as saved user preferences from `localStorage`, URL flags, and default values.

During preparation, the initial state is separated into two parts: pipeline and non-pipeline state. This is because the non-pipeline state should persist for the session duration, even if the pipeline state is reset/overwritten - i.e. if the user selects a new top-level pipeline.

Kedro run data used for experiment tracking are stored in a SQLite database that is generated automatically once [experiment tracking is enabled in your Kedro project](https://kedro.readthedocs.io/en/stable/08_logging/02_experiment_tracking.html). By default, the session store database sits under the `/data` directory of your Kedro project as `session_store.db`. On loading Kedro-Viz from the Kedro project, the Kedro-Viz backend will consume the run data stored in the database and serve the data via the GraphQL endpoint via GraphQL query requests from the Apollo client on the front end.

The server also allows updates to the database for certain fields of the run data (name, notes, etc.) via mutations.

## React components

React components are all to be found in `/src/components/`. The top-level React component for the standalone app is `Container`, which includes some extra code (e.g. global styles and data loading) that aren't included in the component library. The entry-point component for the library (as set by the `main` property in package.json) is `App`.
Expand Down Expand Up @@ -107,18 +103,6 @@ Selectors can be found in `/src/selectors/`. We use [Reselect](https://github.co

We have used Kedro-Viz to visualize the selector dependency graph - [visit the demo to see it in action](https://demo.kedro.org/?data=selectors).

## Apollo

The `src/apollo` directory contains all the related setup for ingesting data from the GraphQL endpoint for the experiment tracking features. This includes the schema that defines all query and mutation types, the config that sets up the Apollo Client to be used within React components, and other files containing helper functions, such as mocks to generate random data for the mock server.

The GraphQL schema is defined on the backend by Strawberry and automatically converted to GraphQL SDL (schema definition language) with `make schema-fix`. A CI check ensures that the resulting `schema.graphql` and below visualization are always in sync with the backend definition.

![Kedro-Viz GraphQL schema](.github/img/schema.graphql.png)

You can see documentation for the schema and run mock queries using GraphiQL, the GraphQL integrated development environment. This is possible without launching the full backend server: run `make strawberry-server` and then go to [http://127.0.0.1:8000/graphql](http://127.0.0.1:8000/graphql).

⚠️ When a query supplies an ordered argument, the backend response must maintain the same ordering. For example, a the response to a query that calls for `runIds = [2, 3, 1]` should respond with runs in that same order.

## Utils

The `/src/utils/` directory contains miscellaneous reusable utility functions.
Expand Down Expand Up @@ -153,4 +137,4 @@ The app uses [redux-watch](https://github.com/ExodusMovement/redux-watch) with a

![Kedro-Viz backend architecture](/.github/img/backend-architecture.png)

The backend of Kedro-Viz serves as the data provider and API layer that interacts with Kedro projects and manages data access for visualisations in the frontend. It offers both REST and GraphQL APIs to support data retrieval for the frontend, allowing access to pipeline structures, node-specific details, and experiment tracking data. Key components include the `DataAccessManager`, which interfaces with data `Repositories` to fetch and structure data. The CLI enables users launch with Kedro-Viz from the command line, while deploy and build options enables seamless sharing of pipeline visualisations on any static website hosting platform.
The backend of Kedro-Viz serves as the data provider and API layer that interacts with Kedro projects and manages data access for visualisations in the frontend. It offers REST API to support data retrieval for the frontend, allowing access to pipeline structures and node-specific details. Key components include the `DataAccessManager`, which interfaces with data `Repositories` to fetch and structure data. The CLI enables users launch with Kedro-Viz from the command line, while deploy and build options enables seamless sharing of pipeline visualisations on any static website hosting platform.
17 changes: 0 additions & 17 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,23 +210,6 @@ make run PROJECT_PATH=<path-to-your-test-project>/new-kedro-project

> **Note**: Once the backend development server is launched at port 4142, the local app will always pull data from that server. To prevent this, you can comment out the proxy setting in `package.json` and restart the dev server at port 4141.
#### Launch the development server with the `SQLiteSessionStore`

Kedro-Viz provides a `SQLiteSessionStore` that users can use in their project to enable experiment tracking functionality. If you want to use this session store with the development server, make sure you don't use a relative path when specifying the store's location in `settings.py`. For example, `demo-project` specifies the local `data` directory within a project as the session store's location as follows:

```python
from kedro_viz.integrations.kedro.sqlite_store import SQLiteStore
SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / "data")}
```

Owing to this coupling between the project settings and Kedro-Viz, if you wish to execute any Kedro commands on `demo-project` (including `kedro run`), you will need to install the Kedro-Viz Python package. To install your local development version of the package, run:

```bash
pip3 install -e package
```

Since Kedro 0.18, a session can only contain one run. In Kedro-Viz, once a session has been retrieved from the store we always use the terminology "run" rather than "session", e.g. `run_id` rather than `session_id`.

## Testing guidelines

- Scope out major journeys from acceptance criteria from the ticket for manual end-to-end testing
Expand Down
Loading

0 comments on commit e864d76

Please sign in to comment.