Make customizing Schema behavior with subclasses easier #1812

sirosen · 2021-05-12T19:01:02Z

I think this is an indication that Schema needs to be easier to customize, not that we need to bake more custom functionality into the base schema.

OneOfSchema is just one motivating use-case. More generally, what kind of structures are problematic today for customizing Schema?

Some examples of things which could possibly be improved:

Customizing _serialize and _deserialize?

#1726 is all about making Schema._serialize and Schema._deserialize open to subclasses as a better place to customize behavior. Whether or not extra={...} is a good idea (#1726), is making these steps friendly to overload a good approach?

Hook execution (order matters)

Often, users want a schema base class which has some fixed "pre_load" or "post_load" behavior. But if a base schema adds a @pre_load method, it will get mixed with the potential hooks on subclasses and the user's desired ordering is lost.

One approach would be to let users overload hook processing and rely on super() to call the "real" hooks. The signatures for _invoke_schema_validators, _invoke_load_processors, and _invoke_dump_processors are all slightly different. Can they be unified into a method like Schema.run_hook(...)? Or, alternatively, replace these with methods like Schema.run_pre_load_hooks(...)?

Customizing handling of unknown fields

INCLUDE, EXCLUDE, and RAISE are good out-of-the-box behaviors. Looking at issues like #853 , could this potentially be solved by making the handling of unknown fields settable as a hook?

Mainly, the logic is here:

marshmallow/src/marshmallow/schema.py

Lines 666 to 680 in 8af34a9

    
           if unknown != EXCLUDE: 
        
               fields = { 
        
                   field_obj.data_key if field_obj.data_key is not None else field_name 
        
                   for field_name, field_obj in self.load_fields.items() 
        
               } 
        
               for key in set(data) - fields: 
        
                   value = data[key] 
        
                   if unknown == INCLUDE: 
        
                       ret_d[key] = value 
        
                   elif unknown == RAISE: 
        
                       error_store.store_error( 
        
                           [self.error_messages["unknown"]], 
        
                           key, 
        
                           (index if index_errors else None), 
        
                       )

What about Schema.handle_unknown_fields(...) or some method like that? So the Schema class implements today's behavior, and users can subclass and override with their own behaviors.

The text was updated successfully, but these errors were encountered:

lafrech · 2021-05-12T20:26:02Z

Regarding hooks order, see #600.

sirosen · 2021-05-12T20:40:30Z

Yes! In particular, I like the idea you proposed there of using hook methods with known names instead of decorators. But I don't think it's easy to handle many, pass_many, original_data, partial with that approach, so I'm trying to find a way.

I'm mildly -1 on adding a priority system as a particular solution. The primary use-case is for users to run a hook always-before subclass hooks or always-after subclass hooks, so priority values are potentially over-complicating the situation.

lafrech · 2021-05-13T21:18:55Z

Just thinking out loud. Wondering if a simple implementation such as this one could be enough.

def post_load(self, item, original_item, partial):
    """Override this to modify single item after load"""
    return item

def post_load_many(self, items, original_items, partial):
    """Override this to modify item list after load

    Called when many = True
    """
     return [
        self.post_load(item, original_item, partial)
        for item, original_item in zip(items, original_items)
    ]

Then in the deserialization method, instead of invoking hooks:

            # Run post processors
            if not errors and postprocess:
                try:
                    if many:
                        result = self.post_load_many(result, data, partial)
                    else:
                        result = self.post_load(result, data, partial)
                except ValidationError as err:
                    errors = err.normalized_messages()

I haven't been giving it too much thought, but although it doesn't look as nice as decorators and it is more manual, it might solve corner case issues. Since we allow the user to do pretty much anything, we don't make assumptions about what the data should look like at each stage.

For instance, this would supports the case where the user's "many" processor modifies the length of the data (#1755), since it allows the user to specify exactly what he wants in the many case.

It makes things more verbose if many processings are involved but... well, better complex than impossible.

This needs more thinking. It could be done in marshmallow 4, in which case I wouldn't worry too much about corner cases in marshmallow 3 and I'd postpone their resolution to marshmallow 4.

lafrech mentioned this issue May 29, 2021

Ordered hook execution #1816

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make customizing Schema behavior with subclasses easier #1812

Make customizing Schema behavior with subclasses easier #1812

sirosen commented May 12, 2021

lafrech commented May 12, 2021

sirosen commented May 12, 2021

lafrech commented May 13, 2021 •

edited

Loading

Make customizing Schema behavior with subclasses easier #1812

Make customizing Schema behavior with subclasses easier #1812

Comments

sirosen commented May 12, 2021

lafrech commented May 12, 2021

sirosen commented May 12, 2021

lafrech commented May 13, 2021 • edited Loading

lafrech commented May 13, 2021 •

edited

Loading