Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add priority to decorator processors #600

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open

Conversation

lgo
Copy link

@lgo lgo commented Mar 12, 2017

In the documentation, it's quite explicit about there being no prioritization of processors. I believe this would be a great addition for a few reasons. It allows more complex post-processing flows.

For example, the case I need priority is where an abstract schema will post-process into a model object, but before that must be some specialized post-processing in an implementation of the schema. This is less than ideal to implement using the suggested pattern due to the post-processor belonging to the abstract class.

This will retain backward compatibility, given that every processor will default to the same priority and the disclaimer in the documentation.

I'm open to suggestions for an alternative interface for this. Defaulting all processors to 0 would require other processors to run before them to have negative values, so likely a non-zero priority is more appropriate.

ps. love marshmallow, and all of the work that y'all do for it! The documentation is among the best that I've experienced, and every time I read through I discover a new solution to an anti-pattern I've had :).

@sloria
Copy link
Member

sloria commented Mar 14, 2017

Thanks for the PR and the kind words, @xLegoz . Looks potentially useful and the implementation is straightforward. It seems, however, that it would be straightforward to meet your use-case using a BaseSchema that exposes hooks, like so:

from marshmallow import Schema, post_load

class BaseSchema(Schema):
    @post_load
    def post_load(self, data):
        postprocessed = self._postprocess(data)
        return self._make_object(postprocessed)

    def _postprocess(self, data):
        return data

    def _make_object(self, data):
        return data

Would this meet your use case?

@lgo
Copy link
Author

lgo commented Mar 14, 2017

Something like that does work out, but with quite a bit of hacking for the project I have, and we've got a lot (probably too many) post-processors we'd need to change for it. At the moment, this has been the best solution and the project is using the forked version so it's not a big deal :).

@sloria
Copy link
Member

sloria commented Mar 19, 2017

Sounds good. I'm not opposed to merging this, but I'd like to take some time to collect my thoughts and get feedback on this.

For one, I think it might make sense for higher-priority methods to execute first, i.e. execute in descending priority order. This seems more intuitive, and also addresses the issue of having 0 as the default priority, as you mentioned in your original post.

@douglas-treadwell
Copy link

douglas-treadwell commented Jun 9, 2017

Interesting idea, @xLegoz!

I agree that it would be more intuitive for higher priority methods to execute first. This just makes sense in English. "High priority" is more urgent than "low priority".

However, I lean toward @sloria's view that the same behavior could be accomplished by writing one @post_load handler that calls several other functions. Where possible, this would be easier to follow than scattered methods executed in order of priority decorators.

You could even define your own @post_load_priority(priority) decorator that would register any number of methods on the Schema as (for example) self._post_load_methods. Your @post_load function could then call all those methods in sorted order, without needing to know all the methods when the @post_load method is defined. Subclasses could define their own methods and specify their own post_load_priority.

Another thought is, is there any way in your design to provide named hooks rather than numeric priorities? If you have a situation where subclasses are inserting processing steps, I wonder if it'd be easier to read and maintain if you define specific steps that subclasses could hook into (which may be what @sloria meant).

@embray
Copy link

embray commented Dec 3, 2017

This is useful, however, when subclassing other schemas that may already have some processors defined on it (say, some post_load). There are some ugly hacks to get around that, but it would be nice if, given the priorities of existing processors (assuming their priority levels are reasonably well spaced out) that a subclass can add additional processors in the same category in a reasonable order with respect to existing processors.

An alternative to this might be to add something like before= or after= arguments to a processor tag, where an existing processor is given. Then, rather than relying on admittedly shaky priority-levels one could explicitly order processors with respect to each other.

@lgo
Copy link
Author

lgo commented Dec 5, 2017

@embray I do agree with your reasoning for the alternative approach with before= and after=. As it's already been mentioned, defining an ordering with numbers is not very intuitive.

There is some ambiguity about how you might specify the functions, so there definitely needs to be some thought there for the most approachable interface. e.g. a few different ways the interface could work could be:

  • @post_processor(before=ThisClass.processor_that_runs_after)
  • @post_processor(before="processor_that_runs_after")

(Of course, if this could functionally be provided as a library that would also be pretty wicked for an experimental interface!)

@embray
Copy link

embray commented Dec 11, 2017

Yeah, I might give that a shot. There are definitely areas where one can run into trouble with this (e.g. cycles). Normally that wouldn't happen but it would probably be good to check.

@vgavro
Copy link
Contributor

vgavro commented Feb 10, 2018

Greetings. I really think that latter method is too complicated for such task. Pros - it's more explicit. Cons - more work to support and implement (cycles that mentioned by @embray, for example), if processor you're depending on is renamed or not exist you should throw an error? And what if processor is applied dynamically on schema instance creation? You need priorities (or order) only for schema sub-classing and mixins. It's really not such a big job to determine integer for it (and you need it in your own code, in most usecases). But on using "after/before" it would not be obvious when two processors without "after/before" is executed (at least I can't get straight picture right now). So in the end - we would get more explicit(+) but more complicated(-) behavior. About third-party libraries with several processors - in such rare cases - they may use "order" with step 10 to allow user-code insert their processors when needed (for example popper.js use step 50 to allow user-code insert "modifiers"). So my vote - to merge this pull request as is (maybe rename "priority" to "order"). Thank you guys for all your work!

@lafrech
Copy link
Member

lafrech commented Nov 22, 2018

The more we answer various use cases with

this is a job for [pre|post]_[dump|load] decorators

the more this is needed.

I'll mark it as 3.0. Feel free to postpone if you think it is not so crucial.

I also agree that using integers has a theoretical limitation that is not a real issue in practice. Python logging module also uses integer logging levels, and defines default levels only for tens, leaving a lot enough space to define custom levels.

At least, integer priorities would be better than any twisted workaround to enforce the order of validators from different code layers or even different libraries.


def _get_priority(self, attr_name, tag):
processor = getattr(self, attr_name)
return processor.__marshmallow_kwargs__[tag].get('priority', 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need that default value for priority?

Since a default is set above, this should work:

    return processor.__marshmallow_kwargs__[tag]['priority']

@lafrech lafrech added this to the 3.0 milestone Dec 11, 2018
@lafrech
Copy link
Member

lafrech commented Dec 11, 2018

Setting 3.0 milestone on this so as not to forget it as it seems feasible. Guys, feel free to postpone.

@deckar01
Copy link
Member

Try overriding the abstract hook with the specialized hook and explicitly declare the dependency order by choosing where to call super.

from marshmallow import Schema, fields, post_load


class BaseModel:
    def __init__(self, data):
        self.data = data

class Base(Schema):
    id = fields.Str()
    
    @post_load
    def make_model(self, data):
        return BaseModel(data)

class Special(Base):
    name = fields.Str()
    
    @post_load
    def make_model(self, data):
        data['name'] = data['name'].title()
        return super().make_model(data)


result = Special().load(dict(id='1', name='test'))
print(result, result.data)

Note: I couldn't actually reproduce the reported issue when building this minimal example. It just demonstrates that inherited hooks can can control the execution order of their base class's hooks, because the base hook is only called once.

@lafrech
Copy link
Member

lafrech commented Dec 11, 2018

This does not solve the case of having a make_model hook in base class and an unrelated post_load hook in a child class. Well, it does if you ensure to name the unrelated hook make_model, but maybe you don't want to do that.

Also, priorities would allow marshmallow-based framework to use hook while application code could use hooks as well. Application would just have to watch the priority numbers in the framework. Hence the 10-step idea described above by @vgavro.

If most people are satisfied with current implementation (where base class needs to define hooks for the application and children classes override them), I won't struggle for this feature to be integrated.

@deckar01
Copy link
Member

deckar01 commented Dec 11, 2018

Python has a builtin way to solve method resolution order in inherited classes. Naming the methods the same thing has the benefit of explicitly declaring the dependency relationship. This is much more explicit than a number. You can keep the special method name, just don't add the decorator to it and you can call it from inside the overridden hook.

class Special(Base):
    name = fields.Str()
    
    def fancy_name(self, data):
        data['name'] = data['name'].title()
        return data

    @post_load
    def make_model(self, data):
        data = self.fancy_name(data)
        return super().make_model(data)

@lafrech
Copy link
Member

lafrech commented Dec 16, 2018

Good point, @deckar01. Pushing this logic even further, the whole decoration mechanism could be removed, the feature being accomplished by hooks with fixed names in the base schema.

class BaseSchema(base.SchemaABC):

    def pre_load(data):
        """Override this to add pre-load logic"""
        return data

The invocation of the hooks would be greatly simplified.

This kind of hook is quite common, simpler than current decoration mechanism, and as you say, it relies on usual Python method resolution order.

Maybe it's not so obvious, due to the many / pass_many combinations, I'd need to investigate a little more, but I like the idea.

It would be a little regression for people using multiple decorated methods with no care about the order (e.g. using two independent post_load methods), as they'd have to merge their processors into a common method, but I think this is acceptable. At least we'd have predictable order.

@vgavro
Copy link
Contributor

vgavro commented Dec 20, 2018

I agree that using python MRO and decorators at same time is counterintuitive, because you should rely on parent custom method names in that case, which may looks good in one-parent-one-child example, but can be real pain if you're defining complex schemes with mixins.
But I like the whole idea to remove decorators and use MRO instead.
Problem with many/pass_many can be resolved with special methods name: post_load_many, which will just apply self.post_load for every item in collection by default.
And if we agree on this #1034 (comment) (always passing **kwargs with original, partial, etc. instead of requesting arguments in decorator) - we could drop current complex implementation and have clear priority with only disadvantage of changed api, which is expected in new major release anyway.
validates_schema option skip_on_field_errors=False can by achieved by explicitly processing ValidationError exception.
@validate(field_name) decorator should be replaced by validate_field(name, value, field=field_obj) then not to break consistency and have order control! (if name == 'my_field': do_something)
So, this looks like a big change. As I can see it - or we're relying on MRO and drop decorators, or we're adding "priority" to keep current decorators API.
Argument that "you can define custom names and then use one post_load, and if post_load already defined in parents - you should override all of them to keep things under control" just don't looks good to me.

@deckar01
Copy link
Member

Decorators are nice, because they explicitly communicate that the method is part of the schema life cycle and not just another method. It is also convenient for use cases that are order independent to be able to declare hooks without having to declare a dependency order. Feel free to open an issue to discuss removing the decorators, but I don't think it is necessary to satisfy this original use case.

This pattern is very similar to overriding methods with static/class decorators. We could document this pattern to make it more intuitive and cover it with tests to ensure we maintain support.

@sloria sloria removed this from the 3.0 milestone Dec 26, 2018
@sloria
Copy link
Member

sloria commented Sep 11, 2019

Is there still interest in this? Seems like the OP can be met using the methods above. Unless there are strong feelings about adding this API, I'll plan to close this in a few days.

@lafrech
Copy link
Member

lafrech commented Sep 12, 2019

I still think things would be easier if we had hooks instead of decorators. easier to maintain, easier to understand. Simple Python and inheritance. The code in this lib to support hooks would be very minimal.

In my app, I'm sometimes declaring post_load methods then overriding them in child classes and I'm reaching a point where it would be saner to define hooks in my base Schema using decorators and then don't use the decorators in rest of the code to avoid messing things up.

I don't have the time to draft an implementation right now.

@vgavro
Copy link
Contributor

vgavro commented Sep 18, 2019

This is response to @deckar01 comment, which happened to be last for a long time, and now I see that it looks like everyone is happy about it, which is not :-)
Also, in last @lafrech comment I believe when he say "hooks" he means MRO, and in my and @deckar01 comment when I say "hooks" I mean current decorator implementation (just to be clear).

[In short - I'm completely support idea to remove decorators in favor of clean MRO, but if it's not to be done - at least add "order" to hooks, because for hooks that modify returned data and data which passes to next hooks - order is crucial]

  • "Decorators are nice, because they explicitly communicate that the method is part of the schema life cycle and not just another method."
    -- special method name is not the same as "just another method". Again, when you're overriding some parent method in python you don't mark it special way - and seems like python community ok with it.

  • "It is also convenient for use cases that are order independent to be able to declare hooks without having to declare a dependency order."
    -- Well, it is, but when "hooks" can return some data, on which other hooks can depend - like pre_load, post_load, pre_dump, post_dump - order is crucial, and this decorators are used mostly for that. So - hooks are cool. for validation which is not changing value. but for input/output changes - hooks are evil.

  • "Feel free to open an issue to discuss removing the decorators, but I don't think it is necessary to satisfy this original use case."
    -- That's not my point. my point is - IF we use decorators (hooks) - we SHOULD allow them order, because it's normal practice and it's crucial even more, because other hooks depends on returned data. If we use MRO - then api must act like MRO, without messing things with MRO/hooks. "There should be one-- and preferably only one --obvious way to do it."

  • "This pattern is very similar to overriding methods with static/class decorators. We could document this pattern to make it more intuitive and cover it with tests to ensure we maintain support."
    -- Really, I'm also using this pattern not to mess things up..

class BaseModel:
  @post_load()
  def post_load(self, data, **kwargs):
    return data
  # same pre_load, post_load, pre_dump, post_dump

but if we're writing library which is not used in one small project, but have it's own ecosystem around it - i'm expecting one obvious way to interact with schemas in other libraries without pain.

@lafrech
Copy link
Member

lafrech commented Sep 19, 2019

Thanks @vgavro for clarifying what I meant.

@ringus1
Copy link

ringus1 commented Apr 12, 2023

FYI this is already possible by properly alphabetising processors, since resolve_hooks is using dir: https://github.com/marshmallow-code/marshmallow/blob/dev/src/marshmallow/schema.py#L166

Not sure if it's omitted in the documentation on purpose or initial implementation didn't intend it to work this way ;)

@lafrech
Copy link
Member

lafrech commented Apr 12, 2023

No it wasn't intended, but it's nice to know.

This dates back to when dicts were not ordered, I suppose dir wasn't either.

@ringus1
Copy link

ringus1 commented Apr 12, 2023

Ok. Though dir has always returned list.
https://docs.python.org/2.6/library/functions.html#dir

@redhead
Copy link

redhead commented Oct 17, 2024

Hi, we also stumbled on the need for this.. we do some processing in post_dump hook and it is crucial to have a specific order of execution of hooks (so the final processing hook isactually done in the end). We noticed that hooks are currently ordered alphabetically which doesn't make much sense and it's a very weird way of ordering the hooks (but I know it wasn't intended).
Any chance of moving forward with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants