Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make customizing Schema behavior with subclasses easier #1812

Open
sirosen opened this issue May 12, 2021 · 3 comments
Open

Make customizing Schema behavior with subclasses easier #1812

sirosen opened this issue May 12, 2021 · 3 comments

Comments

@sirosen
Copy link
Contributor

sirosen commented May 12, 2021

This was mentioned in #1751:

I think this is an indication that Schema needs to be easier to customize, not that we need to bake more custom functionality into the base schema.

OneOfSchema is just one motivating use-case. More generally, what kind of structures are problematic today for customizing Schema?


Some examples of things which could possibly be improved:

  1. Customizing _serialize and _deserialize?

#1726 is all about making Schema._serialize and Schema._deserialize open to subclasses as a better place to customize behavior. Whether or not extra={...} is a good idea (#1726), is making these steps friendly to overload a good approach?

  1. Hook execution (order matters)

Often, users want a schema base class which has some fixed "pre_load" or "post_load" behavior. But if a base schema adds a @pre_load method, it will get mixed with the potential hooks on subclasses and the user's desired ordering is lost.

One approach would be to let users overload hook processing and rely on super() to call the "real" hooks. The signatures for _invoke_schema_validators, _invoke_load_processors, and _invoke_dump_processors are all slightly different. Can they be unified into a method like Schema.run_hook(...)? Or, alternatively, replace these with methods like Schema.run_pre_load_hooks(...)?

  1. Customizing handling of unknown fields

INCLUDE, EXCLUDE, and RAISE are good out-of-the-box behaviors. Looking at issues like #853 , could this potentially be solved by making the handling of unknown fields settable as a hook?

Mainly, the logic is here:

if unknown != EXCLUDE:
fields = {
field_obj.data_key if field_obj.data_key is not None else field_name
for field_name, field_obj in self.load_fields.items()
}
for key in set(data) - fields:
value = data[key]
if unknown == INCLUDE:
ret_d[key] = value
elif unknown == RAISE:
error_store.store_error(
[self.error_messages["unknown"]],
key,
(index if index_errors else None),
)

What about Schema.handle_unknown_fields(...) or some method like that? So the Schema class implements today's behavior, and users can subclass and override with their own behaviors.

@lafrech
Copy link
Member

lafrech commented May 12, 2021

Regarding hooks order, see #600.

@sirosen
Copy link
Contributor Author

sirosen commented May 12, 2021

Yes! In particular, I like the idea you proposed there of using hook methods with known names instead of decorators. But I don't think it's easy to handle many, pass_many, original_data, partial with that approach, so I'm trying to find a way.

I'm mildly -1 on adding a priority system as a particular solution. The primary use-case is for users to run a hook always-before subclass hooks or always-after subclass hooks, so priority values are potentially over-complicating the situation.

@lafrech
Copy link
Member

lafrech commented May 13, 2021

Just thinking out loud. Wondering if a simple implementation such as this one could be enough.

def post_load(self, item, original_item, partial):
    """Override this to modify single item after load"""
    return item

def post_load_many(self, items, original_items, partial):
    """Override this to modify item list after load

    Called when many = True
    """
     return [
        self.post_load(item, original_item, partial)
        for item, original_item in zip(items, original_items)
    ]

Then in the deserialization method, instead of invoking hooks:

            # Run post processors
            if not errors and postprocess:
                try:
                    if many:
                        result = self.post_load_many(result, data, partial)
                    else:
                        result = self.post_load(result, data, partial)
                except ValidationError as err:
                    errors = err.normalized_messages()

I haven't been giving it too much thought, but although it doesn't look as nice as decorators and it is more manual, it might solve corner case issues. Since we allow the user to do pretty much anything, we don't make assumptions about what the data should look like at each stage.

For instance, this would supports the case where the user's "many" processor modifies the length of the data (#1755), since it allows the user to specify exactly what he wants in the many case.

It makes things more verbose if many processings are involved but... well, better complex than impossible.

This needs more thinking. It could be done in marshmallow 4, in which case I wouldn't worry too much about corner cases in marshmallow 3 and I'd postpone their resolution to marshmallow 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants