Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX #66 #64 #54 #30 #31 #72 #29 #62 - context access - kedro 16.5 - hook auto registration #86

Conversation

takikadiri
Copy link
Collaborator

closes #66 #64 #54 #30 #31 #72 #29 #62

Description

This PR intend to fix mlflow configs related issues. This indirectly made it easier to upgrade the kedro dependency to 0.16.5 and enable hook's auto-registration, modulo some tests fixing.

Development notes

Some important changes :

  • kedro-mlflow can now access to project context properties. That decouple the plugin from the template.
    We used this access to change get_mlflow_config signature, the test have been updated accordignately.

  • mlflow init no longer touch the run.py

  • I instantiate an MlflowHook objects, as .kedro.yml and setuptools entrypoint declared hooks take hook object (not class)

  • I add a .kedro.yml file, src folder and run.py file to the kedro project fixture for tests running in kedro 0.16.5

For more details, see changes.

Checklist

  • Read the contributing guidelines
  • Open this PR as a 'Draft Pull Request' if it is work-in-progress
  • Update the documentation to reflect the code changes
  • Add a description of this change and add your name to the list of supporting contributions in the CHANGELOG.md file. Please respect Keep a Changelog guidelines.
  • Add tests to cover your changes

Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

@codecov-io
Copy link

codecov-io commented Oct 11, 2020

Codecov Report

Merging #86 into develop will increase coverage by 0.30%.
The diff coverage is 98.85%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #86      +/-   ##
===========================================
+ Coverage    96.83%   97.13%   +0.30%     
===========================================
  Files           32       31       -1     
  Lines         1199     1151      -48     
===========================================
- Hits          1161     1118      -43     
+ Misses          38       33       -5     
Impacted Files Coverage Δ
setup.py 0.00% <0.00%> (ø)
tests/framework/cli/test_cli.py 100.00% <ø> (ø)
kedro_mlflow/framework/cli/cli.py 100.00% <100.00%> (ø)
kedro_mlflow/framework/context/mlflow_context.py 100.00% <100.00%> (ø)
kedro_mlflow/framework/hooks/node_hook.py 100.00% <100.00%> (ø)
kedro_mlflow/framework/hooks/pipeline_hook.py 98.79% <100.00%> (+0.07%) ⬆️
kedro_mlflow/utils.py 100.00% <100.00%> (ø)
tests/conftest.py 100.00% <100.00%> (ø)
tests/framework/context/test_mlflow_context.py 100.00% <100.00%> (ø)
tests/framework/hooks/test_node_hook.py 98.41% <100.00%> (+0.97%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4c0b6a7...446af5a. Read the comment docs.

Copy link
Owner

@Galileo-Galilei Galileo-Galilei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is really great and adds a lot of new features, but I think we should separate thingsinto at least 3 PR:

  • context access is a whole PR in itself, and tests must be added (for instance, test that mlflow config now works with TemplatedConfigLoader or with an environment different from "local") -> regarding this, all your changes are ok for me and I think we can merge them soon once tests are added.
  • migration to 0.16.5 modifies a lot of things, with low retro compatibility, especially for kedro mlflow init. We can accept this, but is must be very clear in the documentation (especially this section: https://github.com/Galileo-Galilei/kedro-mlflow/blob/develop/docs/source/03_tutorial/02_setup.md#automatic-template-update-recommended). I think the switch must be thoroughly thought and made clear before deprecation. For now, we can modify the kedro.yml file (or pyproject.toml) to register the hook automatically without modifying the run.py file.
  • plugin auto registration is a no go while kedro viz bug is not fixed. If people have to chose, I hardly believe they wil choose kedro-mlflow upon kedro-viz :) I don't think it is necessary to make the PR right now, but you can have a separated branch and we'll merge it once it's good to go.

EDIT: I don't think this PR can go in the release 0.3.0 as is, but we can do a patch release soon (before 0.4.0) if no breaking change is introduced since it is a big improvement!

@@ -4,12 +4,15 @@

### Added

- kedro-mlflow hooks are now auto-registred (#29)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not do this in this PR, since it breaks kedro viz (see kedro-org/kedro-viz#260). Can you remove it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kedro-viz just work fine with kedro 0.16.5 and auto registred hooks. I'll test it with previous kedro versions

@@ -21,6 +24,9 @@

### Changed

- Bump kedro requirement to <=0.16.5
- `kedro mlflow init` is no longer touching the run.py file
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should likely add a test for the kedro version to ensure retro compatibility, or make it very clear in the docs that if you have kedro ">=0.16,<0.16.4".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we should discuss this particular point in a separate PR.

@@ -21,6 +24,9 @@

### Changed

- Bump kedro requirement to <=0.16.5
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check that the tests are not broken with kedro<=0.16, because the CI only tests for the last version of kedro, and wa can't be sure that nothing is broken.

import subprocess
from pathlib import Path

import click
from kedro import __file__ as KEDRO_PATH
from kedro.framework.context import get_static_project_data, load_context
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this breaks retrocompatibilty with kedro<=0.16.4. It should definitely be a separated PR in case we need to rollback. But it's a great idea!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will definitely break kedro <=0.16.4 compatibility as i just saw that get_static_project_data is introduced in kedro 0.16.5. I'll keep the _get_project_globals utils

mlflow_yml = "mlflow.yml"
write_jinja_template(
src=TEMPLATE_FOLDER_PATH / mlflow_yml,
is_cookiecutter=False,
dst=project_path / "conf" / "base" / mlflow_yml,
python_package=project_globals["python_package"],
dst=project_path / conf_root / "base" / mlflow_yml,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not remember this was hardcoded, well spotted 👍

@@ -51,6 +90,9 @@ def before_node_run(
mlflow.log_params(params_inputs)


mlflow_node_hooks = MlflowNodeHook()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary? Is it for auto registration?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for registring hooks in .kedro.yml or in setuptools entrypoints.

@@ -62,9 +66,13 @@ def before_pipeline_run(
pipeline: The ``Pipeline`` that will be run.
catalog: The ``DataCatalog`` to be used during the run.
"""
mlflow_conf = get_mlflow_config(
project_path=run_params["project_path"], env=run_params["env"]
context = load_context(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context -> self.context (in case we need it elsewhere, for now it should have no impact)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spoted, it's a self.context

"kedro_mlflow = kedro_mlflow.framework.cli.cli:commands"
"kedro_mlflow = kedro_mlflow.framework.cli.cli:commands",
],
"kedro.hooks": [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be removed because of the kedro-viz bug. We can eventually modify mlflow_init to add what is necessary in the kedro.yml file.

@@ -4,13 +4,13 @@
import pytest
import yaml
from kedro.extras.datasets.pickle import PickleDataSet
from kedro.framework.context import KedroContext
from kedro.framework.context import get_static_project_data
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only compatible with kedro>=0.16.5

@@ -85,6 +85,9 @@ def test_model_packaging(tmp_path, pipeline_ml_obj):
)
assert loaded_model.predict(1) == {"predictions": 2}

if mlflow.active_run():
mlflow.end_run()

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very likely something we can do for all tests. Pytest have a way to call some cleanup code after each tests, and we must force all active mlflow runs (especially when tests fails)

@Galileo-Galilei
Copy link
Owner

Close this issue which has been merged as #91 and #94

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accessing the project context inside hooks implementations
3 participants