Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionally move session store and stats file to .viz directory #1915

Merged
merged 22 commits into from
Jun 25, 2024

Conversation

ravi-kumar-pilla
Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla commented May 21, 2024

Description

Resolves #1891

Development notes

  • Created 2 constants at package/kedro_viz/constants.py -
VIZ_SESSION_STORE_ARGS = {"path": ".viz"}
VIZ_METADATA_ARGS = {"path": ".viz"}
  • If the user provides SESSION_STORE_ARGS, Kedro-Viz will use it to store the session_store.db file. If not, it will use the VIZ_SESSION_STORE_ARGS["path"] .
  • Started writing/reading stats.json to/from VIZ_METADATA_ARGS["path"] with the new release.
  • Updated document about the default VIZ_SESSION_STORE_ARGS
  • Updated document about stats.json being created under .viz if they have viz installed and did not disable hooks for kedro-viz plugin.

QA notes

  • From your kedro project, run kedro run by defining SESSION_STORE_ARGS in your settings.py file. session_store.db file will be modified/created at the location specified in settings.py file.
  • kedro viz run should read sessions data for experiment tracking from the location specified above
  • From your kedro project, run kedro run without defining SESSION_STORE_ARGS in your settings.py file. session_store.db file will be modified/created at <kedro_project_root>/.viz directory.
  • kedro viz run should read sessions data for experiment tracking from <kedro_project_root>/.viz/session_store.db
  • With kedro run stats.json file will be created under <kedro_project_root>/.viz directory
  • kedro viz run should read data of stats.json from the new directory created above

NOTE:

  1. Kedro starters will not be modified in this PR.
  2. Users need to copy their session_store.db file manually to .viz folder in-case they want to remove SESSION_STORE_ARGS from their settings.py file and would like to have the session_store.db file created under .viz folder for future kedro runs. They can ignore this if they do not wish to either use .viz folder or do not want historic kedro run details in experiment tracking.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: ravi-kumar-pilla <[email protected]>
Signed-off-by: ravi-kumar-pilla <[email protected]>
Signed-off-by: ravi-kumar-pilla <[email protected]>
Signed-off-by: ravi-kumar-pilla <[email protected]>
Signed-off-by: ravi-kumar-pilla <[email protected]>
…st/spike-viz-folder

Signed-off-by: ravi-kumar-pilla <[email protected]>
Signed-off-by: ravi-kumar-pilla <[email protected]>
Signed-off-by: ravi-kumar-pilla <[email protected]>
Signed-off-by: ravi-kumar-pilla <[email protected]>
Signed-off-by: ravi-kumar-pilla <[email protected]>
@ravi-kumar-pilla ravi-kumar-pilla changed the title Testing .viz folder creation/migration Move session store and stats file to .viz directory Jun 17, 2024
Signed-off-by: ravi-kumar-pilla <[email protected]>
@ravi-kumar-pilla ravi-kumar-pilla changed the title Move session store and stats file to .viz directory Conditionally move session store and stats file to .viz directory Jun 17, 2024
@ravi-kumar-pilla ravi-kumar-pilla marked this pull request as ready for review June 17, 2024 01:38
@ravi-kumar-pilla ravi-kumar-pilla requested a review from noklam June 17, 2024 01:38
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking change? how should this release with the change of #1951 in starters?

We need to consider the scenarios below:

  1. Old Kedro starter user ( i.e. 0.19.0) + kedro-viz 9.2.0
  2. Old Kedro starter user(i.e. 0.19.0) + kedro-viz 9.1.0 (no change, this is the current sitution)
  3. New Kedro starter user(i.e. 0.19.7) + Kedro-viz 9.2.0 (it should work nicely)
  4. New Kedro starter user(ie.. 0.19.7) + old Kedro-viz user (maybe the solution is we need to set a lower bound in the starter kedro-viz>=9.2.0?

Other questions I have, I see that path is currently set as .viz, what happens when viz is not run from the root directory, would it produce the artifact in the correct directory?

Signed-off-by: ravi-kumar-pilla <[email protected]>
Signed-off-by: ravi-kumar-pilla <[email protected]>
@ravi-kumar-pilla
Copy link
Contributor Author

Hi @noklam ,

Is this a breaking change? how should this release with the change of #1951 in starters?

The changes in this PR will not break anything for the starters. These are defaults that are introduced in-case users do not provide them in settings.py file. I am only concerned with the loss of historic ET data in the below case (might be a minor thing) -

  1. Upgraded kedro-viz==9.2.0
  2. Removed SESSION_STORE_ARGS from settings.py
  3. Do a kedro run

This will create a session_store.db file in <root>/.viz which does not contain the previous runs.

We need to consider the scenarios below:

  1. Old Kedro starter user ( i.e. 0.19.0) + kedro-viz 9.2.0 - Pass
  2. Old Kedro starter user(i.e. 0.19.0) + kedro-viz 9.1.0 (no change, this is the current sitution)
  3. New Kedro starter user(i.e. 0.19.7) + Kedro-viz 9.2.0 (it should work nicely) - Pass
  4. New Kedro starter user(ie.. 0.19.7) + old Kedro-viz user (maybe the solution is we need to set a lower bound in the starter kedro-viz>=9.2.0? - Yes, with new Kedro starter which removes SESSION_STORE_ARGS, kedro run will fail with the below error -
ValueError: 
(sqlite3.OperationalError) unable to open database file

The reason for the above error might be because we did not create a parent folder for the default path if the folder does not exist.

Other questions I have, I see that path is currently set as .viz, what happens when viz is not run from the root directory, would it produce the artifact in the correct directory?

Yes, I appended the _find_kedro_project() return value to the path. If you see the code, the returned location is -
f"{kedro_project_path}/{VIZ_SESSION_STORE_ARGS['path']}/session_store.db"

However lets say the user goes into the sub-directory and has also provided SESSION_STORE_ARGS , this PR will not create the file at the root, as it relies on the path sent from Kedro. Let me know if you need more information.

Thank you

@noklam
Copy link
Contributor

noklam commented Jun 21, 2024

The changes in this PR will not break anything for the starters. These are defaults that are introduced in-case users do not provide them in settings.py file. I am only concerned with the loss of historic ET data in the below case (might be a minor thing) -

Yes bu

ValueError:
(sqlite3.OperationalError) unable to open database file

Is this a bug? I don't expect this to happen.

Yes, I appended the _find_kedro_project() return value to the path. If you see the code, the returned location is -
f"{kedro_project_path}/{VIZ_SESSION_STORE_ARGS['path']}/session_store.db"

However lets say the user goes into the sub-directory and has also provided SESSION_STORE_ARGS , this PR will not create the file at the root, as it relies on the path sent from Kedro. Let me know if you need more information.

They should have the same behavior, what makes the user define argument different from the default? Maybe we could make this clear that it should be relative to the root directory. In caes user provide an absolute path, then we should respect it and read/create the session.db in the user-defined location.

@noklam
Copy link
Contributor

noklam commented Jun 21, 2024

We also need some migration documentations.

For existing viz user, this PR should not change anything. If they want to move it to .viz, we can advise them to remove the settings.py or specify it in .viz.

For new user - we will pin kedro-viz >=9.2.0 in kedro-starter to make sure the default will be .viz. What happen if settings is empty and kedro-viz <9.2.0? Is there a default location or kedro-viz will simply error out?

@ravi-kumar-pilla
Copy link
Contributor Author

ValueError:
(sqlite3.OperationalError) unable to open database file

Is this a bug? I don't expect this to happen.

Seems like it. We never got across this because
SESSION_STORE_ARGS = {"path": str(Path(__file__).parents[2] / "data")} the parent folder was always present

They should have the same behavior, what makes the user define argument different from the default? Maybe we could make this clear that it should be relative to the root directory. In caes user provide an absolute path, then we should respect it and read/create the session.db in the user-defined location.

Yes, we do respect what user is sending. Please have a look at the code in package/kedro_viz/integrations/kedro/sqlite_store.py

Maybe we could make this clear that it should be relative to the root directory.

This seems a bit contradictory to the absolute path you mentioned. You can have a quick look at the code -

    @property
    def location(self) -> str:
        """Returns location of the sqlite_store database"""
        if "path" not in settings.SESSION_STORE_ARGS:
            kedro_project_path = _find_kedro_project(Path.cwd()) or self._path
            return _get_session_path(
                f"{kedro_project_path}/{VIZ_SESSION_STORE_ARGS['path']}/session_store.db"
            )

        return _get_session_path(f"{self._path}/session_store.db")

Thank you

@ravi-kumar-pilla
Copy link
Contributor Author

ravi-kumar-pilla commented Jun 21, 2024

We also need some migration documentations.

For existing viz user, this PR should not change anything. If they want to move it to .viz, we can advise them to remove the settings.py or specify it in .viz.

  1. stats.json will be moved to .viz by default (should we do something like delete stats.json if present at root as we created it before, so like a clean up ) ?
  2. session_store.db is an opt in change which can be documented. I added some information in the docs. But let me know if you feel we need to add it in other places.

For new user - we will pin kedro-viz >=9.2.0 in kedro-starter to make sure the default will be .viz. What happen if settings is empty and kedro-viz <9.2.0? Is there a default location or kedro-viz will simply error out?

If you mean settings.py does not contain SESSION_STORE_CLASS, there won't be any error and viz do not have a session store initialized, so ET will be empty. But lets say, if the user has provided the below in settings.py -

from kedro_viz.integrations.kedro.sqlite_store import SQLiteStore
SESSION_STORE_CLASS = SQLiteStore

without mentioning SESSION_STORE_ARGS , then it will error out.

In BaseSessionStore, we have - store_args.setdefault("path", (self._project_path / "sessions").as_posix()) so the default path is ./sessions and if the user does not have the dir sessions at project path, kedro run will error out (Indeed a bug)

As I mentioned in this comment we did not had mkdir - session_file_path.parent.mkdir(parents=True, exist_ok=True) . Lets say sessions/session_store.db is the path, if we did not had sessions dir, it will error out.

Lets connect if this is not clear. Thank you

@@ -97,6 +97,9 @@ This specifies the creation of the `SQLiteStore` under the `data` subfolder, usi

This step is crucial to enable experiment tracking features on Kedro-Viz, as it is the database used to serve all run data to the Kedro-Viz front-end. Once this step is complete, you can either proceed to [set up the tracking datasets](#set-up-experiment-tracking-datasets) or [set up your nodes and pipelines to log metrics](#modify-your-nodes-and-pipelines-to-log-metrics); these two activities are interchangeable, but both should be completed to get a working experiment tracking setup.

```{note}
Starting from Kedro-Viz 9.2.0, if the user does not provide `SESSION_STORE_ARGS`, a default directory `.viz` will be created at the root of your Kedro project and will be used for `SQLiteStore`.
Copy link
Contributor

@rashidakanchwala rashidakanchwala Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Starting from Kedro-Viz 9.2.0, if the user does not provide `SESSION_STORE_ARGS`, a default directory `.viz` will be created at the root of your Kedro project and will be used for `SQLiteStore`.
Starting from Kedro-Viz 9.2.0, if the user does not provide `SESSION_STORE_ARGS` in the project settings, a default directory `.viz` will be created at the root of your Kedro project and used for `SQLiteStore`.

@@ -30,6 +30,10 @@ The next step is optional, but useful to check that all is working. Run the full
kedro run
```

```{note}
Copy link
Contributor

@rashidakanchwala rashidakanchwala Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't mention dataset stats feature anywhere in the docs, maybe we first briefly describe the feature - what it is and then say from Kedro-viz 9.2.0 stats are saved in .viz folder? something like below?

Kedro-Viz provides stats related to datasets under the metadata panel

(insert image).

These stats are generated by hooks. If you have Kedro-Viz installed and run kedro run, the hooks will generate the stats by default. To disable this, you can disable Kedro-Viz hooks in the settings. Starting from version 9.2.0, Kedro-Viz stats are saved in the .viz folder.

@@ -25,3 +25,6 @@
"for users with kedro-datasets >= 2.1.0",
},
}

VIZ_SESSION_STORE_ARGS = {"path": ".viz"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just have one VIZ_PATH_ARGS since both are same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I too thought of this but since both the constants are for different purposes, though they have the same value for now, it would be nice to keep them separate. Lets say we introduce more keys for VIZ_METADATA_ARGS in future other than path.

Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks!

@ravi-kumar-pilla ravi-kumar-pilla merged commit f82eb87 into main Jun 25, 2024
39 checks passed
@ravi-kumar-pilla ravi-kumar-pilla deleted the test/spike-viz-folder branch June 25, 2024 18:09
@SajidAlamQB SajidAlamQB mentioned this pull request Jul 25, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kedro new --tools=viz produce files like stats.json in the root directory
3 participants