-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX #66 #64 #54 #30 #31 #72 #29 #62 - context access - kedro 16.5 - hook auto registration #86
Conversation
…Galilei#30 Galileo-Galilei#31 Galileo-Galilei#72 Galileo-Galilei#29 Galileo-Galilei#62 - context access - kedro 16.5 - hook auto registration
Codecov Report
@@ Coverage Diff @@
## develop #86 +/- ##
===========================================
+ Coverage 96.83% 97.13% +0.30%
===========================================
Files 32 31 -1
Lines 1199 1151 -48
===========================================
- Hits 1161 1118 -43
+ Misses 38 33 -5
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is really great and adds a lot of new features, but I think we should separate thingsinto at least 3 PR:
- context access is a whole PR in itself, and tests must be added (for instance, test that mlflow config now works with TemplatedConfigLoader or with an environment different from "local") -> regarding this, all your changes are ok for me and I think we can merge them soon once tests are added.
- migration to 0.16.5 modifies a lot of things, with low retro compatibility, especially for kedro mlflow init. We can accept this, but is must be very clear in the documentation (especially this section: https://github.com/Galileo-Galilei/kedro-mlflow/blob/develop/docs/source/03_tutorial/02_setup.md#automatic-template-update-recommended). I think the switch must be thoroughly thought and made clear before deprecation. For now, we can modify the
kedro.yml
file (orpyproject.toml
) to register the hook automatically without modifying therun.py
file. - plugin auto registration is a no go while kedro viz bug is not fixed. If people have to chose, I hardly believe they wil choose kedro-mlflow upon kedro-viz :) I don't think it is necessary to make the PR right now, but you can have a separated branch and we'll merge it once it's good to go.
EDIT: I don't think this PR can go in the release 0.3.0 as is, but we can do a patch release soon (before 0.4.0) if no breaking change is introduced since it is a big improvement!
@@ -4,12 +4,15 @@ | |||
|
|||
### Added | |||
|
|||
- kedro-mlflow hooks are now auto-registred (#29) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not do this in this PR, since it breaks kedro viz (see kedro-org/kedro-viz#260). Can you remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kedro-viz just work fine with kedro 0.16.5 and auto registred hooks. I'll test it with previous kedro versions
@@ -21,6 +24,9 @@ | |||
|
|||
### Changed | |||
|
|||
- Bump kedro requirement to <=0.16.5 | |||
- `kedro mlflow init` is no longer touching the run.py file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should likely add a test for the kedro version to ensure retro compatibility, or make it very clear in the docs that if you have kedro ">=0.16,<0.16.4".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we should discuss this particular point in a separate PR.
@@ -21,6 +24,9 @@ | |||
|
|||
### Changed | |||
|
|||
- Bump kedro requirement to <=0.16.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check that the tests are not broken with kedro<=0.16, because the CI only tests for the last version of kedro, and wa can't be sure that nothing is broken.
import subprocess | ||
from pathlib import Path | ||
|
||
import click | ||
from kedro import __file__ as KEDRO_PATH | ||
from kedro.framework.context import get_static_project_data, load_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this breaks retrocompatibilty with kedro<=0.16.4. It should definitely be a separated PR in case we need to rollback. But it's a great idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will definitely break kedro <=0.16.4 compatibility as i just saw that get_static_project_data
is introduced in kedro 0.16.5. I'll keep the _get_project_globals
utils
mlflow_yml = "mlflow.yml" | ||
write_jinja_template( | ||
src=TEMPLATE_FOLDER_PATH / mlflow_yml, | ||
is_cookiecutter=False, | ||
dst=project_path / "conf" / "base" / mlflow_yml, | ||
python_package=project_globals["python_package"], | ||
dst=project_path / conf_root / "base" / mlflow_yml, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not remember this was hardcoded, well spotted 👍
@@ -51,6 +90,9 @@ def before_node_run( | |||
mlflow.log_params(params_inputs) | |||
|
|||
|
|||
mlflow_node_hooks = MlflowNodeHook() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this necessary? Is it for auto registration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for registring hooks in .kedro.yml or in setuptools entrypoints.
@@ -62,9 +66,13 @@ def before_pipeline_run( | |||
pipeline: The ``Pipeline`` that will be run. | |||
catalog: The ``DataCatalog`` to be used during the run. | |||
""" | |||
mlflow_conf = get_mlflow_config( | |||
project_path=run_params["project_path"], env=run_params["env"] | |||
context = load_context( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context -> self.context (in case we need it elsewhere, for now it should have no impact)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spoted, it's a self.context
"kedro_mlflow = kedro_mlflow.framework.cli.cli:commands" | ||
"kedro_mlflow = kedro_mlflow.framework.cli.cli:commands", | ||
], | ||
"kedro.hooks": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be removed because of the kedro-viz bug. We can eventually modify mlflow_init to add what is necessary in the kedro.yml
file.
@@ -4,13 +4,13 @@ | |||
import pytest | |||
import yaml | |||
from kedro.extras.datasets.pickle import PickleDataSet | |||
from kedro.framework.context import KedroContext | |||
from kedro.framework.context import get_static_project_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this only compatible with kedro>=0.16.5
@@ -85,6 +85,9 @@ def test_model_packaging(tmp_path, pipeline_ml_obj): | |||
) | |||
assert loaded_model.predict(1) == {"predictions": 2} | |||
|
|||
if mlflow.active_run(): | |||
mlflow.end_run() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very likely something we can do for all tests. Pytest have a way to call some cleanup code after each tests, and we must force all active mlflow runs (especially when tests fails)
closes #66 #64 #54 #30 #31 #72 #29 #62
Description
This PR intend to fix mlflow configs related issues. This indirectly made it easier to upgrade the kedro dependency to 0.16.5 and enable hook's auto-registration, modulo some tests fixing.
Development notes
Some important changes :
kedro-mlflow can now access to project context properties. That decouple the plugin from the template.
We used this access to change
get_mlflow_config
signature, the test have been updated accordignately.mlflow init
no longer touch the run.pyI instantiate an MlflowHook objects, as .kedro.yml and setuptools entrypoint declared hooks take hook object (not class)
I add a
.kedro.yml
file,src
folder andrun.py
file to the kedro project fixture for tests running in kedro 0.16.5For more details, see changes.
Checklist
CHANGELOG.md
file. Please respect Keep a Changelog guidelines.Notice
I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.