-
-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add priority to decorator processors #600
base: dev
Are you sure you want to change the base?
Conversation
Thanks for the PR and the kind words, @xLegoz . Looks potentially useful and the implementation is straightforward. It seems, however, that it would be straightforward to meet your use-case using a BaseSchema that exposes hooks, like so: from marshmallow import Schema, post_load
class BaseSchema(Schema):
@post_load
def post_load(self, data):
postprocessed = self._postprocess(data)
return self._make_object(postprocessed)
def _postprocess(self, data):
return data
def _make_object(self, data):
return data Would this meet your use case? |
Something like that does work out, but with quite a bit of hacking for the project I have, and we've got a lot (probably too many) post-processors we'd need to change for it. At the moment, this has been the best solution and the project is using the forked version so it's not a big deal :). |
Sounds good. I'm not opposed to merging this, but I'd like to take some time to collect my thoughts and get feedback on this. For one, I think it might make sense for higher-priority methods to execute first, i.e. execute in descending priority order. This seems more intuitive, and also addresses the issue of having |
Interesting idea, @xLegoz! I agree that it would be more intuitive for higher priority methods to execute first. This just makes sense in English. "High priority" is more urgent than "low priority". However, I lean toward @sloria's view that the same behavior could be accomplished by writing one @post_load handler that calls several other functions. Where possible, this would be easier to follow than scattered methods executed in order of priority decorators. You could even define your own @post_load_priority(priority) decorator that would register any number of methods on the Schema as (for example) self._post_load_methods. Your @post_load function could then call all those methods in sorted order, without needing to know all the methods when the @post_load method is defined. Subclasses could define their own methods and specify their own post_load_priority. Another thought is, is there any way in your design to provide named hooks rather than numeric priorities? If you have a situation where subclasses are inserting processing steps, I wonder if it'd be easier to read and maintain if you define specific steps that subclasses could hook into (which may be what @sloria meant). |
This is useful, however, when subclassing other schemas that may already have some processors defined on it (say, some An alternative to this might be to add something like |
@embray I do agree with your reasoning for the alternative approach with There is some ambiguity about how you might specify the functions, so there definitely needs to be some thought there for the most approachable interface. e.g. a few different ways the interface could work could be:
(Of course, if this could functionally be provided as a library that would also be pretty wicked for an experimental interface!) |
Yeah, I might give that a shot. There are definitely areas where one can run into trouble with this (e.g. cycles). Normally that wouldn't happen but it would probably be good to check. |
Greetings. I really think that latter method is too complicated for such task. Pros - it's more explicit. Cons - more work to support and implement (cycles that mentioned by @embray, for example), if processor you're depending on is renamed or not exist you should throw an error? And what if processor is applied dynamically on schema instance creation? You need priorities (or order) only for schema sub-classing and mixins. It's really not such a big job to determine integer for it (and you need it in your own code, in most usecases). But on using "after/before" it would not be obvious when two processors without "after/before" is executed (at least I can't get straight picture right now). So in the end - we would get more explicit(+) but more complicated(-) behavior. About third-party libraries with several processors - in such rare cases - they may use "order" with step 10 to allow user-code insert their processors when needed (for example popper.js use step 50 to allow user-code insert "modifiers"). So my vote - to merge this pull request as is (maybe rename "priority" to "order"). Thank you guys for all your work! |
The more we answer various use cases with
the more this is needed. I'll mark it as 3.0. Feel free to postpone if you think it is not so crucial. I also agree that using integers has a theoretical limitation that is not a real issue in practice. Python logging module also uses integer logging levels, and defines default levels only for tens, leaving a lot enough space to define custom levels. At least, integer priorities would be better than any twisted workaround to enforce the order of validators from different code layers or even different libraries. |
|
||
def _get_priority(self, attr_name, tag): | ||
processor = getattr(self, attr_name) | ||
return processor.__marshmallow_kwargs__[tag].get('priority', 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need that default value for priority?
Since a default is set above, this should work:
return processor.__marshmallow_kwargs__[tag]['priority']
Setting 3.0 milestone on this so as not to forget it as it seems feasible. Guys, feel free to postpone. |
Try overriding the abstract hook with the specialized hook and explicitly declare the dependency order by choosing where to call super. from marshmallow import Schema, fields, post_load
class BaseModel:
def __init__(self, data):
self.data = data
class Base(Schema):
id = fields.Str()
@post_load
def make_model(self, data):
return BaseModel(data)
class Special(Base):
name = fields.Str()
@post_load
def make_model(self, data):
data['name'] = data['name'].title()
return super().make_model(data)
result = Special().load(dict(id='1', name='test'))
print(result, result.data) Note: I couldn't actually reproduce the reported issue when building this minimal example. It just demonstrates that inherited hooks can can control the execution order of their base class's hooks, because the base hook is only called once. |
This does not solve the case of having a Also, priorities would allow marshmallow-based framework to use hook while application code could use hooks as well. Application would just have to watch the priority numbers in the framework. Hence the 10-step idea described above by @vgavro. If most people are satisfied with current implementation (where base class needs to define hooks for the application and children classes override them), I won't struggle for this feature to be integrated. |
Python has a builtin way to solve method resolution order in inherited classes. Naming the methods the same thing has the benefit of explicitly declaring the dependency relationship. This is much more explicit than a number. You can keep the special method name, just don't add the decorator to it and you can call it from inside the overridden hook. class Special(Base):
name = fields.Str()
def fancy_name(self, data):
data['name'] = data['name'].title()
return data
@post_load
def make_model(self, data):
data = self.fancy_name(data)
return super().make_model(data) |
Good point, @deckar01. Pushing this logic even further, the whole decoration mechanism could be removed, the feature being accomplished by hooks with fixed names in the base schema. class BaseSchema(base.SchemaABC):
def pre_load(data):
"""Override this to add pre-load logic"""
return data The invocation of the hooks would be greatly simplified. This kind of hook is quite common, simpler than current decoration mechanism, and as you say, it relies on usual Python method resolution order. Maybe it's not so obvious, due to the It would be a little regression for people using multiple decorated methods with no care about the order (e.g. using two independent |
I agree that using python MRO and decorators at same time is counterintuitive, because you should rely on parent custom method names in that case, which may looks good in one-parent-one-child example, but can be real pain if you're defining complex schemes with mixins. |
Decorators are nice, because they explicitly communicate that the method is part of the schema life cycle and not just another method. It is also convenient for use cases that are order independent to be able to declare hooks without having to declare a dependency order. Feel free to open an issue to discuss removing the decorators, but I don't think it is necessary to satisfy this original use case. This pattern is very similar to overriding methods with static/class decorators. We could document this pattern to make it more intuitive and cover it with tests to ensure we maintain support. |
Is there still interest in this? Seems like the OP can be met using the methods above. Unless there are strong feelings about adding this API, I'll plan to close this in a few days. |
I still think things would be easier if we had hooks instead of decorators. easier to maintain, easier to understand. Simple Python and inheritance. The code in this lib to support hooks would be very minimal. In my app, I'm sometimes declaring post_load methods then overriding them in child classes and I'm reaching a point where it would be saner to define hooks in my base Schema using decorators and then don't use the decorators in rest of the code to avoid messing things up. I don't have the time to draft an implementation right now. |
This is response to @deckar01 comment, which happened to be last for a long time, and now I see that it looks like everyone is happy about it, which is not :-) [In short - I'm completely support idea to remove decorators in favor of clean MRO, but if it's not to be done - at least add "order" to hooks, because for hooks that modify returned data and data which passes to next hooks - order is crucial]
but if we're writing library which is not used in one small project, but have it's own ecosystem around it - i'm expecting one obvious way to interact with schemas in other libraries without pain. |
Thanks @vgavro for clarifying what I meant. |
FYI this is already possible by properly alphabetising processors, since Not sure if it's omitted in the documentation on purpose or initial implementation didn't intend it to work this way ;) |
No it wasn't intended, but it's nice to know. This dates back to when dicts were not ordered, I suppose dir wasn't either. |
Ok. Though |
Hi, we also stumbled on the need for this.. we do some processing in post_dump hook and it is crucial to have a specific order of execution of hooks (so the final processing hook isactually done in the end). We noticed that hooks are currently ordered alphabetically which doesn't make much sense and it's a very weird way of ordering the hooks (but I know it wasn't intended). |
In the documentation, it's quite explicit about there being no prioritization of processors. I believe this would be a great addition for a few reasons. It allows more complex post-processing flows.
For example, the case I need
priority
is where an abstract schema will post-process into a model object, but before that must be some specialized post-processing in an implementation of the schema. This is less than ideal to implement using the suggested pattern due to the post-processor belonging to the abstract class.This will retain backward compatibility, given that every processor will default to the same priority and the disclaimer in the documentation.
I'm open to suggestions for an alternative interface for this. Defaulting all processors to
0
would require other processors to run before them to have negative values, so likely a non-zero priority is more appropriate.ps. love marshmallow, and all of the work that y'all do for it! The documentation is among the best that I've experienced, and every time I read through I discover a new solution to an anti-pattern I've had :).