diff --git a/src/content/_jupyter/reflections/pickle.jpg b/src/content/_jupyter/reflections/pickle.jpg new file mode 100644 index 00000000..35b2f36d Binary files /dev/null and b/src/content/_jupyter/reflections/pickle.jpg differ diff --git a/src/content/_jupyter/reflections/pydantic_dump.ipynb b/src/content/_jupyter/reflections/pydantic_dump.ipynb new file mode 100644 index 00000000..42794068 --- /dev/null +++ b/src/content/_jupyter/reflections/pydantic_dump.ipynb @@ -0,0 +1,1150 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pydantic model dump - what's inside?\n", + "\n", + "In this quick example we will create a Pydantic model and dump it to see what's inside.\n", + "Here I will start to make one distinction:\n", + "\n", + "* **Decorated methods** that serve as definitions for validators on various fields will be called straight-up **validators** by me,\n", + "* The limits such as `min_length`, `max_length`, `min_value`, `max_value` etc. will be called **constraints**,\n", + "defined as additional arguments to the `Field` constructor.\n", + "\n", + "This is to make the distinction between the two clearer, which wil lcome in handy when we will be looking at the dumped model\n", + "and, later, creating the library which is the main goal of this article.\n", + "\n", + "So first, let's create a simple Pydantic model with one validator and two constraints on the `age` field." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import pydantic\n", + "import json\n", + "import logging\n", + "\n", + "\n", + "class Nested(pydantic.BaseModel):\n", + " name: str\n", + " age: int = pydantic.Field(ge=0, le=80)\n", + "\n", + " @pydantic.field_validator('age')\n", + " def check_age(cls, value):\n", + " if value < 18:\n", + " raise ValueError('You need to be an adult to use this service.')\n", + " return value" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now check if all bells and whistles are in good order by feeding some data to the model and checking if it's valid.\n", + "Let's do that for a list of differing ages, since we know that anything in the range from $0$ to $18$ **and** above $80$ is invalid." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ERROR:root:1 validation error for Nested\n", + "age\n", + " Input should be greater than or equal to 0 [type=greater_than_equal, input_value=-1, input_type=int]\n", + " For further information visit https://errors.pydantic.dev/2.7/v/greater_than_equal\n", + "ERROR:root:1 validation error for Nested\n", + "age\n", + " Value error, You need to be an adult to use this service. [type=value_error, input_value=0, input_type=int]\n", + " For further information visit https://errors.pydantic.dev/2.7/v/value_error\n", + "ERROR:root:1 validation error for Nested\n", + "age\n", + " Value error, You need to be an adult to use this service. [type=value_error, input_value=17, input_type=int]\n", + " For further information visit https://errors.pydantic.dev/2.7/v/value_error\n", + "INFO:root:John is 18 years old.\n", + "INFO:root:John is 80 years old.\n", + "ERROR:root:1 validation error for Nested\n", + "age\n", + " Input should be less than or equal to 80 [type=less_than_equal, input_value=81, input_type=int]\n", + " For further information visit https://errors.pydantic.dev/2.7/v/less_than_equal\n" + ] + } + ], + "source": [ + "logging.basicConfig(level=logging.DEBUG)\n", + "\n", + "# We can see if validators/constraints are working\n", + "# by trying to create a model with invalid and valid values.\n", + "# For invalid values, we expect a ValidationError to be raised.\n", + "# For valid values, we expect the model to be created successfully.\n", + "for age in [-1, 0, 17, 18, 80, 81]:\n", + " try:\n", + " Nested(name='John', age=age)\n", + " logging.info(f'John is {age} years old.')\n", + " except pydantic.ValidationError as e:\n", + " logging.error(e)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "import datetime\n", + "\n", + "TIMESTAMP_START = datetime.datetime.now() - datetime.timedelta(days=100) # All timestamps will be relative to this one\n", + "\n", + "\n", + "class ModelWithDatetime(pydantic.BaseModel):\n", + " created_at: str\n", + "\n", + " @pydantic.field_validator('created_at')\n", + " def check_created_at(cls, value):\n", + " iso_formatted_value = datetime.datetime.fromisoformat(value)\n", + " if iso_formatted_value - TIMESTAMP_START < datetime.timedelta(days=0):\n", + " raise ValueError('The timestamp is too old.')\n", + " return iso_formatted_value" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This class is a simple example of a context-dependent model - it checks the date 100 days before the current date\n", + "and then validates if passed date is not older than that.\n", + "\n", + "This is a simple example of a validator that uses a global context to validate the field, because:\n", + "\n", + "* `datetime` need to be imported from the `datetime` module and included in the current `globals()`,\n", + "* `datetime.now()` is a function that is called during the model creation and is not a part of the model itself,\n", + "* the `TIMESTAMP_START` constant is a module-wide constant that is used in the validator.\n", + "\n", + "Let's check how this model will behave when the global context changes." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ERROR:root:Failed to access the datetime module: name 'datetime' is not defined\n", + "ERROR:root:Failed to create an instance of the model: name 'datetime' is not defined\n", + "ERROR:root:1 validation error for ModelWithDatetime\n", + "created_at\n", + " Value error, The timestamp is too old. [type=value_error, input_value='1410-07-15T00:00:00', input_type=str]\n", + " For further information visit https://errors.pydantic.dev/2.7/v/value_error\n" + ] + } + ], + "source": [ + "# Here we will employ a trick to remove the datetime module from the globals() dictionary,\n", + "# so it is not available to the unpickled object.\n", + "if 'datetime' in globals():\n", + " del globals()['datetime']\n", + "\n", + "# We can check it by trying to access the datetime module\n", + "try:\n", + " datetime.datetime.now()\n", + "except NameError as e:\n", + " logging.error(f'Failed to access the datetime module: {e}')\n", + "\n", + "# Now we will try to create an instance of the model\n", + "try:\n", + " ModelWithDatetime(created_at='1410-07-15T00:00:00') # This is the date of the Battle of Grunwald\n", + "except NameError as e:\n", + " logging.error(f'Failed to create an instance of the model: {e}')\n", + "\n", + "# Now we will try to create an instance of the model AFTER we have restored the datetime module\n", + "\n", + "import datetime\n", + "\n", + "try:\n", + " ModelWithDatetime(created_at='1410-07-15T00:00:00')\n", + "except pydantic.ValidationError as e:\n", + " logging.error(e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Aha! We got an `NameError` error, because the `datetime` module is not available in the validator function in a clean Python environment.\n", + "After re-importing the missing module, we can see that the model model performs the validation as expected, hence the `ValidationError`\n", + "is raised for the date that is older than 100 days.\n", + "\n", + "This means that any dependencies used inside of the valdiator functions need to be installed and re-imported in the new environment\n", + "in order to work properly. One way to fix this would be to move the importing of the `datetime` module to the\n", + "source code of validator function, but this is not a good practice, because it makes the code less readable and harder to maintain.\n", + "\n", + "However, we will bite the bullet and try this approach to see if it will work." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# Again, remove datetime\n", + "if 'datetime' in globals():\n", + " del globals()['datetime']\n", + "\n", + "\n", + "class ModelWithDatetimeRedux(pydantic.BaseModel):\n", + " created_at: str\n", + "\n", + " @pydantic.field_validator('created_at')\n", + " def check_created_at(cls, value):\n", + " \"\"\"\n", + " What we do is we basically try to \"pack\" the whole context of the function here\n", + " \"\"\"\n", + " import datetime\n", + " TIMESTAMP_START = datetime.datetime.now() - datetime.timedelta(days=100)\n", + "\n", + " iso_formatted_value = datetime.datetime.fromisoformat(value)\n", + " if iso_formatted_value - TIMESTAMP_START < datetime.timedelta(days=0):\n", + " raise ValueError('The timestamp is too old.')\n", + " return value" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ERROR:root:1 validation error for ModelWithDatetimeRedux\n", + "created_at\n", + " Value error, The timestamp is too old. [type=value_error, input_value='1410-07-15T00:00:00', input_type=str]\n", + " For further information visit https://errors.pydantic.dev/2.7/v/value_error\n" + ] + } + ], + "source": [ + "# Gone with the datetime module again\n", + "if 'datetime' in globals():\n", + " del globals()['datetime']\n", + "\n", + "# Now we will try to create an instance of the model, which should pass\n", + "\n", + "try:\n", + " ModelWithDatetimeRedux(created_at='1410-07-15T00:00:00')\n", + "except pydantic.ValidationError as e:\n", + " logging.error(e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Cool, cool, this approach works and may be used to move around our Pydantic models from one environment to another,\n", + "since the validators are now self-contained and do not depend on any global context. We need to check a couple of things.\n", + "\n", + "First - how this works out for nested models, since we can have fields that are Pydantic models themselves." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:root:{\n", + " \"properties\": {\n", + " \"name\": {\n", + " \"title\": \"Name\",\n", + " \"type\": \"string\"\n", + " },\n", + " \"age\": {\n", + " \"maximum\": 80,\n", + " \"minimum\": 0,\n", + " \"title\": \"Age\",\n", + " \"type\": \"integer\"\n", + " }\n", + " },\n", + " \"required\": [\n", + " \"name\",\n", + " \"age\"\n", + " ],\n", + " \"title\": \"Nested\",\n", + " \"type\": \"object\"\n", + "}\n" + ] + } + ], + "source": [ + "class Nested(pydantic.BaseModel):\n", + " name: str\n", + " age: int = pydantic.Field(ge=0, le=80)\n", + "\n", + " @pydantic.field_validator('age')\n", + " def check_age(cls, value):\n", + " if value < 18:\n", + " raise ValueError('You need to be an adult to use this service.')\n", + " return value\n", + "\n", + "\n", + "class Root(pydantic.BaseModel):\n", + " description: str\n", + " nested: Nested\n", + "\n", + "# Let's see what information is available in the JSON dump of our model\n", + "nested_model = Root(\n", + " description='A model with a nested model',\n", + " nested=Nested(\n", + " name='John',\n", + " age=18\n", + " )\n", + ")\n", + "\n", + "# We will confirm that the model is created successfully,\n", + "# and nested model is also created successfully as one of the fields\n", + "logging.info(\n", + " json.dumps( # This method is just for indentation only\n", + " Nested.model_json_schema(), # This is a V2 version of the schema dump\n", + " indent=2\n", + " )\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As expected, validation for $-1$, $0$, $18$ and $81$ failed, while $1$, $17$, $19$ and $80$ passed.\n", + "This means that both our validator and constraints are working as expected.\n", + "\n", + "## Serializing the model\n", + "\n", + "Now, let's serialize the model to see what's inside. Pydantic allows us to dump the model to a dictionary, which we can then print out,\n", + "using the `model_json_schema` (previously it was `schema_json`) method. As we can see, the model is serialized to a dictionary with\n", + "contains only information about the **contraints** applied to the fields, but no mention is found of the **validators**." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:root:{\n", + " \"$defs\": {\n", + " \"Nested\": {\n", + " \"properties\": {\n", + " \"name\": {\n", + " \"title\": \"Name\",\n", + " \"type\": \"string\"\n", + " },\n", + " \"age\": {\n", + " \"maximum\": 80,\n", + " \"minimum\": 0,\n", + " \"title\": \"Age\",\n", + " \"type\": \"integer\"\n", + " }\n", + " },\n", + " \"required\": [\n", + " \"name\",\n", + " \"age\"\n", + " ],\n", + " \"title\": \"Nested\",\n", + " \"type\": \"object\"\n", + " }\n", + " },\n", + " \"properties\": {\n", + " \"description\": {\n", + " \"title\": \"Description\",\n", + " \"type\": \"string\"\n", + " },\n", + " \"nested\": {\n", + " \"$ref\": \"#/$defs/Nested\"\n", + " }\n", + " },\n", + " \"required\": [\n", + " \"description\",\n", + " \"nested\"\n", + " ],\n", + " \"title\": \"Root\",\n", + " \"type\": \"object\"\n", + "}\n", + "INFO:root:{\n", + " \"$defs\": {\n", + " \"Nested\": {\n", + " \"properties\": {\n", + " \"name\": {\n", + " \"title\": \"Name\",\n", + " \"type\": \"string\"\n", + " },\n", + " \"age\": {\n", + " \"maximum\": 80,\n", + " \"minimum\": 0,\n", + " \"title\": \"Age\",\n", + " \"type\": \"integer\"\n", + " }\n", + " },\n", + " \"required\": [\n", + " \"name\",\n", + " \"age\"\n", + " ],\n", + " \"title\": \"Nested\",\n", + " \"type\": \"object\"\n", + " }\n", + " },\n", + " \"properties\": {\n", + " \"description\": {\n", + " \"title\": \"Description\",\n", + " \"type\": \"string\"\n", + " },\n", + " \"nested\": {\n", + " \"$ref\": \"#/$defs/Nested\"\n", + " }\n", + " },\n", + " \"required\": [\n", + " \"description\",\n", + " \"nested\"\n", + " ],\n", + " \"title\": \"Root\",\n", + " \"type\": \"object\"\n", + "}\n" + ] + } + ], + "source": [ + "# Let's see what information is available in the JSON dump of our model\n", + "logging.info(\n", + " json.dumps( # This method is just for indentation only\n", + " Root.model_json_schema(), # This is a V2 version of the schema dump\n", + " indent=2\n", + " )\n", + ")\n", + "logging.info(\n", + " Root.schema_json(indent=2)\n", + ") # This is a V1 version of the schema dump\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's try the more lower-lever `dict()` method on the model instance to see if it will give us more information." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:root:{'name': 'John', 'age': 18}\n" + ] + } + ], + "source": [ + "model_instance = Nested(name='John', age=18)\n", + "logging.info(\n", + " Root.dict(model_instance)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So far no hit on the validators. Let's just use the Python built-in `__dict__` attribute to see if we can find anything useful there." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:root:{'__module__': '__main__', '__annotations__': {'description': , 'nested': }, 'model_config': {}, '__class_vars__': set(), '__private_attributes__': {}, '__weakref__': , '__doc__': None, '__abstractmethods__': frozenset(), '_abc_impl': <_abc._abc_data object at 0x7fb27211e9c0>, '__pydantic_custom_init__': False, '__pydantic_post_init__': None, '__pydantic_decorators__': DecoratorInfos(validators={}, field_validators={}, root_validators={}, field_serializers={}, model_serializers={}, model_validators={}, computed_fields={}), '__pydantic_generic_metadata__': {'origin': None, 'args': (), 'parameters': ()}, '__pydantic_complete__': True, '__pydantic_parent_namespace__': {'__name__': '__main__', '__doc__': 'Automatically created module for IPython interactive environment', '__package__': , '__loader__': , '__spec__': , '__builtin__': , '__builtins__': , '_ih': ['', \"import pydantic\\nimport json\\nimport logging\\n\\n\\nclass Nested(pydantic.BaseModel):\\n name: str\\n age: int = pydantic.Field(ge=0, le=80)\\n\\n @pydantic.field_validator('age')\\n def check_age(cls, value):\\n if value < 18:\\n raise ValueError('You need to be an adult to use this service.')\\n return value\", \"logging.basicConfig(level=logging.DEBUG)\\n\\n# We can see if validators/constraints are working\\n# by trying to create a model with invalid and valid values.\\n# For invalid values, we expect a ValidationError to be raised.\\n# For valid values, we expect the model to be created successfully.\\nfor age in [-1, 0, 17, 18, 80, 81]:\\n try:\\n Nested(name='John', age=age)\\n logging.info(f'John is {age} years old.')\\n except pydantic.ValidationError as e:\\n logging.error(e)\", \"import datetime\\n\\nTIMESTAMP_START = datetime.datetime.now() - datetime.timedelta(days=100) # All timestamps will be relative to this one\\n\\n\\nclass ModelWithDatetime(pydantic.BaseModel):\\n created_at: str\\n\\n @pydantic.field_validator('created_at')\\n def check_created_at(cls, value):\\n iso_formatted_value = datetime.datetime.fromisoformat(value)\\n if iso_formatted_value - TIMESTAMP_START < datetime.timedelta(days=0):\\n raise ValueError('The timestamp is too old.')\\n return iso_formatted_value\", \"# Here we will employ a trick to remove the datetime module from the globals() dictionary,\\n# so it is not available to the unpickled object.\\nif 'datetime' in globals():\\n del globals()['datetime']\\n\\n# We can check it by trying to access the datetime module\\ntry:\\n datetime.datetime.now()\\nexcept NameError as e:\\n logging.error(f'Failed to access the datetime module: {e}')\\n\\n# Now we will try to create an instance of the model\\ntry:\\n ModelWithDatetime(created_at='1410-07-15T00:00:00') # This is the date of the Battle of Grunwald\\nexcept NameError as e:\\n logging.error(f'Failed to create an instance of the model: {e}')\\n\\n# Now we will try to create an instance of the model AFTER we have restored the datetime module\\n\\nimport datetime\\n\\ntry:\\n ModelWithDatetime(created_at='1410-07-15T00:00:00')\\nexcept pydantic.ValidationError as e:\\n logging.error(e)\", '# Again, remove datetime\\nif \\'datetime\\' in globals():\\n del globals()[\\'datetime\\']\\n\\n\\nclass ModelWithDatetimeRedux(pydantic.BaseModel):\\n created_at: str\\n\\n @pydantic.field_validator(\\'created_at\\')\\n def check_created_at(cls, value):\\n \"\"\"\\n What we do is we basically try to \"pack\" the whole context of the function here\\n \"\"\"\\n import datetime\\n TIMESTAMP_START = datetime.datetime.now() - datetime.timedelta(days=100)\\n\\n iso_formatted_value = datetime.datetime.fromisoformat(value)\\n if iso_formatted_value - TIMESTAMP_START < datetime.timedelta(days=0):\\n raise ValueError(\\'The timestamp is too old.\\')\\n return value', \"# Gone with the datetime module again\\nif 'datetime' in globals():\\n del globals()['datetime']\\n\\n# Now we will try to create an instance of the model, which should pass\\n\\ntry:\\n ModelWithDatetimeRedux(created_at='1410-07-15T00:00:00')\\nexcept pydantic.ValidationError as e:\\n logging.error(e)\", \"class Nested(pydantic.BaseModel):\\n name: str\\n age: int = pydantic.Field(ge=0, le=80)\\n\\n @pydantic.field_validator('age')\\n def check_age(cls, value):\\n if value < 18:\\n raise ValueError('You need to be an adult to use this service.')\\n return value\\n\\n\\nclass Root(pydantic.BaseModel):\\n description: str\\n nested: Nested\\n\\n# Let's see what information is available in the JSON dump of our model\\nnested_model = Root(\\n description='A model with a nested model',\\n nested=Nested(\\n name='John',\\n age=18\\n )\\n)\\n\\n# We will confirm that the model is created successfully,\\n# and nested model is also created successfully as one of the fields\\nlogging.info(\\n json.dumps( # This method is just for indentation only\\n Nested.model_json_schema(), # This is a V2 version of the schema dump\\n indent=2\\n )\\n)\", \"# Let's see what information is available in the JSON dump of our model\\nlogging.info(\\n json.dumps( # This method is just for indentation only\\n Root.model_json_schema(), # This is a V2 version of the schema dump\\n indent=2\\n )\\n)\\nlogging.info(\\n Root.schema_json(indent=2)\\n) # This is a V1 version of the schema dump\", \"model_instance = Nested(name='John', age=18)\\nlogging.info(\\n Root.dict(model_instance)\\n)\", 'whole_dict_of_model = Root.__dict__\\nlogging.info(whole_dict_of_model)'], '_oh': {}, '_dh': [PosixPath('/home/krybacki/Repozytoria/Prywatne/kamilrybacki.github.io')], 'In': ['', \"import pydantic\\nimport json\\nimport logging\\n\\n\\nclass Nested(pydantic.BaseModel):\\n name: str\\n age: int = pydantic.Field(ge=0, le=80)\\n\\n @pydantic.field_validator('age')\\n def check_age(cls, value):\\n if value < 18:\\n raise ValueError('You need to be an adult to use this service.')\\n return value\", \"logging.basicConfig(level=logging.DEBUG)\\n\\n# We can see if validators/constraints are working\\n# by trying to create a model with invalid and valid values.\\n# For invalid values, we expect a ValidationError to be raised.\\n# For valid values, we expect the model to be created successfully.\\nfor age in [-1, 0, 17, 18, 80, 81]:\\n try:\\n Nested(name='John', age=age)\\n logging.info(f'John is {age} years old.')\\n except pydantic.ValidationError as e:\\n logging.error(e)\", \"import datetime\\n\\nTIMESTAMP_START = datetime.datetime.now() - datetime.timedelta(days=100) # All timestamps will be relative to this one\\n\\n\\nclass ModelWithDatetime(pydantic.BaseModel):\\n created_at: str\\n\\n @pydantic.field_validator('created_at')\\n def check_created_at(cls, value):\\n iso_formatted_value = datetime.datetime.fromisoformat(value)\\n if iso_formatted_value - TIMESTAMP_START < datetime.timedelta(days=0):\\n raise ValueError('The timestamp is too old.')\\n return iso_formatted_value\", \"# Here we will employ a trick to remove the datetime module from the globals() dictionary,\\n# so it is not available to the unpickled object.\\nif 'datetime' in globals():\\n del globals()['datetime']\\n\\n# We can check it by trying to access the datetime module\\ntry:\\n datetime.datetime.now()\\nexcept NameError as e:\\n logging.error(f'Failed to access the datetime module: {e}')\\n\\n# Now we will try to create an instance of the model\\ntry:\\n ModelWithDatetime(created_at='1410-07-15T00:00:00') # This is the date of the Battle of Grunwald\\nexcept NameError as e:\\n logging.error(f'Failed to create an instance of the model: {e}')\\n\\n# Now we will try to create an instance of the model AFTER we have restored the datetime module\\n\\nimport datetime\\n\\ntry:\\n ModelWithDatetime(created_at='1410-07-15T00:00:00')\\nexcept pydantic.ValidationError as e:\\n logging.error(e)\", '# Again, remove datetime\\nif \\'datetime\\' in globals():\\n del globals()[\\'datetime\\']\\n\\n\\nclass ModelWithDatetimeRedux(pydantic.BaseModel):\\n created_at: str\\n\\n @pydantic.field_validator(\\'created_at\\')\\n def check_created_at(cls, value):\\n \"\"\"\\n What we do is we basically try to \"pack\" the whole context of the function here\\n \"\"\"\\n import datetime\\n TIMESTAMP_START = datetime.datetime.now() - datetime.timedelta(days=100)\\n\\n iso_formatted_value = datetime.datetime.fromisoformat(value)\\n if iso_formatted_value - TIMESTAMP_START < datetime.timedelta(days=0):\\n raise ValueError(\\'The timestamp is too old.\\')\\n return value', \"# Gone with the datetime module again\\nif 'datetime' in globals():\\n del globals()['datetime']\\n\\n# Now we will try to create an instance of the model, which should pass\\n\\ntry:\\n ModelWithDatetimeRedux(created_at='1410-07-15T00:00:00')\\nexcept pydantic.ValidationError as e:\\n logging.error(e)\", \"class Nested(pydantic.BaseModel):\\n name: str\\n age: int = pydantic.Field(ge=0, le=80)\\n\\n @pydantic.field_validator('age')\\n def check_age(cls, value):\\n if value < 18:\\n raise ValueError('You need to be an adult to use this service.')\\n return value\\n\\n\\nclass Root(pydantic.BaseModel):\\n description: str\\n nested: Nested\\n\\n# Let's see what information is available in the JSON dump of our model\\nnested_model = Root(\\n description='A model with a nested model',\\n nested=Nested(\\n name='John',\\n age=18\\n )\\n)\\n\\n# We will confirm that the model is created successfully,\\n# and nested model is also created successfully as one of the fields\\nlogging.info(\\n json.dumps( # This method is just for indentation only\\n Nested.model_json_schema(), # This is a V2 version of the schema dump\\n indent=2\\n )\\n)\", \"# Let's see what information is available in the JSON dump of our model\\nlogging.info(\\n json.dumps( # This method is just for indentation only\\n Root.model_json_schema(), # This is a V2 version of the schema dump\\n indent=2\\n )\\n)\\nlogging.info(\\n Root.schema_json(indent=2)\\n) # This is a V1 version of the schema dump\", \"model_instance = Nested(name='John', age=18)\\nlogging.info(\\n Root.dict(model_instance)\\n)\", 'whole_dict_of_model = Root.__dict__\\nlogging.info(whole_dict_of_model)'], 'Out': {}, 'get_ipython': , 'exit': , 'quit': , 'open': , '_': '', '__': '', '___': '', '__vsc_ipynb_file__': '/home/krybacki/Repozytoria/Prywatne/kamilrybacki.github.io/src/content/_jupyter/reflections/pydantic_dump.ipynb', '_i': \"# Gone with the datetime module again\\nif 'datetime' in globals():\\n del globals()['datetime']\\n\\n# Now we will try to create an instance of the model, which should pass\\n\\ntry:\\n ModelWithDatetimeRedux(created_at='1410-07-15T00:00:00')\\nexcept pydantic.ValidationError as e:\\n logging.error(e)\", '_ii': '# Again, remove datetime\\nif \\'datetime\\' in globals():\\n del globals()[\\'datetime\\']\\n\\n\\nclass ModelWithDatetimeRedux(pydantic.BaseModel):\\n created_at: str\\n\\n @pydantic.field_validator(\\'created_at\\')\\n def check_created_at(cls, value):\\n \"\"\"\\n What we do is we basically try to \"pack\" the whole context of the function here\\n \"\"\"\\n import datetime\\n TIMESTAMP_START = datetime.datetime.now() - datetime.timedelta(days=100)\\n\\n iso_formatted_value = datetime.datetime.fromisoformat(value)\\n if iso_formatted_value - TIMESTAMP_START < datetime.timedelta(days=0):\\n raise ValueError(\\'The timestamp is too old.\\')\\n return value', '_iii': \"# Here we will employ a trick to remove the datetime module from the globals() dictionary,\\n# so it is not available to the unpickled object.\\nif 'datetime' in globals():\\n del globals()['datetime']\\n\\n# We can check it by trying to access the datetime module\\ntry:\\n datetime.datetime.now()\\nexcept NameError as e:\\n logging.error(f'Failed to access the datetime module: {e}')\\n\\n# Now we will try to create an instance of the model\\ntry:\\n ModelWithDatetime(created_at='1410-07-15T00:00:00') # This is the date of the Battle of Grunwald\\nexcept NameError as e:\\n logging.error(f'Failed to create an instance of the model: {e}')\\n\\n# Now we will try to create an instance of the model AFTER we have restored the datetime module\\n\\nimport datetime\\n\\ntry:\\n ModelWithDatetime(created_at='1410-07-15T00:00:00')\\nexcept pydantic.ValidationError as e:\\n logging.error(e)\", '_i1': \"import pydantic\\nimport json\\nimport logging\\n\\n\\nclass Nested(pydantic.BaseModel):\\n name: str\\n age: int = pydantic.Field(ge=0, le=80)\\n\\n @pydantic.field_validator('age')\\n def check_age(cls, value):\\n if value < 18:\\n raise ValueError('You need to be an adult to use this service.')\\n return value\", 'pydantic': , 'json': , 'logging': , 'Nested': , '_i2': \"logging.basicConfig(level=logging.DEBUG)\\n\\n# We can see if validators/constraints are working\\n# by trying to create a model with invalid and valid values.\\n# For invalid values, we expect a ValidationError to be raised.\\n# For valid values, we expect the model to be created successfully.\\nfor age in [-1, 0, 17, 18, 80, 81]:\\n try:\\n Nested(name='John', age=age)\\n logging.info(f'John is {age} years old.')\\n except pydantic.ValidationError as e:\\n logging.error(e)\", 'age': 81, '_i3': \"import datetime\\n\\nTIMESTAMP_START = datetime.datetime.now() - datetime.timedelta(days=100) # All timestamps will be relative to this one\\n\\n\\nclass ModelWithDatetime(pydantic.BaseModel):\\n created_at: str\\n\\n @pydantic.field_validator('created_at')\\n def check_created_at(cls, value):\\n iso_formatted_value = datetime.datetime.fromisoformat(value)\\n if iso_formatted_value - TIMESTAMP_START < datetime.timedelta(days=0):\\n raise ValueError('The timestamp is too old.')\\n return iso_formatted_value\", 'TIMESTAMP_START': datetime.datetime(2024, 3, 28, 17, 20, 20, 548319), 'ModelWithDatetime': , '_i4': \"# Here we will employ a trick to remove the datetime module from the globals() dictionary,\\n# so it is not available to the unpickled object.\\nif 'datetime' in globals():\\n del globals()['datetime']\\n\\n# We can check it by trying to access the datetime module\\ntry:\\n datetime.datetime.now()\\nexcept NameError as e:\\n logging.error(f'Failed to access the datetime module: {e}')\\n\\n# Now we will try to create an instance of the model\\ntry:\\n ModelWithDatetime(created_at='1410-07-15T00:00:00') # This is the date of the Battle of Grunwald\\nexcept NameError as e:\\n logging.error(f'Failed to create an instance of the model: {e}')\\n\\n# Now we will try to create an instance of the model AFTER we have restored the datetime module\\n\\nimport datetime\\n\\ntry:\\n ModelWithDatetime(created_at='1410-07-15T00:00:00')\\nexcept pydantic.ValidationError as e:\\n logging.error(e)\", '_i5': '# Again, remove datetime\\nif \\'datetime\\' in globals():\\n del globals()[\\'datetime\\']\\n\\n\\nclass ModelWithDatetimeRedux(pydantic.BaseModel):\\n created_at: str\\n\\n @pydantic.field_validator(\\'created_at\\')\\n def check_created_at(cls, value):\\n \"\"\"\\n What we do is we basically try to \"pack\" the whole context of the function here\\n \"\"\"\\n import datetime\\n TIMESTAMP_START = datetime.datetime.now() - datetime.timedelta(days=100)\\n\\n iso_formatted_value = datetime.datetime.fromisoformat(value)\\n if iso_formatted_value - TIMESTAMP_START < datetime.timedelta(days=0):\\n raise ValueError(\\'The timestamp is too old.\\')\\n return value', 'ModelWithDatetimeRedux': , '_i6': \"# Gone with the datetime module again\\nif 'datetime' in globals():\\n del globals()['datetime']\\n\\n# Now we will try to create an instance of the model, which should pass\\n\\ntry:\\n ModelWithDatetimeRedux(created_at='1410-07-15T00:00:00')\\nexcept pydantic.ValidationError as e:\\n logging.error(e)\", '_i7': \"class Nested(pydantic.BaseModel):\\n name: str\\n age: int = pydantic.Field(ge=0, le=80)\\n\\n @pydantic.field_validator('age')\\n def check_age(cls, value):\\n if value < 18:\\n raise ValueError('You need to be an adult to use this service.')\\n return value\\n\\n\\nclass Root(pydantic.BaseModel):\\n description: str\\n nested: Nested\\n\\n# Let's see what information is available in the JSON dump of our model\\nnested_model = Root(\\n description='A model with a nested model',\\n nested=Nested(\\n name='John',\\n age=18\\n )\\n)\\n\\n# We will confirm that the model is created successfully,\\n# and nested model is also created successfully as one of the fields\\nlogging.info(\\n json.dumps( # This method is just for indentation only\\n Nested.model_json_schema(), # This is a V2 version of the schema dump\\n indent=2\\n )\\n)\"}, 'model_fields': {'description': FieldInfo(annotation=str, required=True), 'nested': FieldInfo(annotation=Nested, required=True)}, '__pydantic_core_schema__': {'type': 'model', 'cls': , 'schema': {'type': 'model-fields', 'fields': {'description': {'type': 'model-field', 'schema': {'type': 'str', 'metadata': {}}, 'metadata': {'pydantic_js_functions': [], 'pydantic_js_annotation_functions': [.json_schema_update_func at 0x7fb2723f8900>]}}, 'nested': {'type': 'model-field', 'schema': {'type': 'model', 'cls': , 'schema': {'type': 'model-fields', 'fields': {'name': {'type': 'model-field', 'schema': {'type': 'str', 'metadata': {}}, 'metadata': {'pydantic_js_functions': [], 'pydantic_js_annotation_functions': [.json_schema_update_func at 0x7fb2723f8400>]}}, 'age': {'type': 'model-field', 'schema': {'function': {'type': 'no-info', 'function': >}, 'schema': {'type': 'int', 'le': 80, 'ge': 0, 'metadata': {}}, 'type': 'function-after', 'metadata': {}}, 'metadata': {'pydantic_js_functions': [], 'pydantic_js_annotation_functions': [.json_schema_update_func at 0x7fb2723f87c0>]}}}, 'model_name': 'Nested', 'computed_fields': [], 'metadata': {}}, 'custom_init': False, 'root_model': False, 'config': {'title': 'Nested'}, 'ref': '__main__.Nested:94551602161136', 'metadata': {'pydantic_js_functions': [functools.partial(, cls=), >], 'pydantic_js_annotation_functions': []}}, 'metadata': {'pydantic_js_functions': [], 'pydantic_js_annotation_functions': [.json_schema_update_func at 0x7fb2723f8a40>]}}}, 'model_name': 'Root', 'computed_fields': [], 'metadata': {}}, 'custom_init': False, 'root_model': False, 'config': {'title': 'Root'}, 'ref': '__main__.Root:94551601194704', 'metadata': {'pydantic_js_functions': [functools.partial(, cls=), >], 'pydantic_js_annotation_functions': []}}, '__pydantic_validator__': SchemaValidator(title=\"Root\", validator=Model(\n", + " ModelValidator {\n", + " revalidate: Never,\n", + " validator: ModelFields(\n", + " ModelFieldsValidator {\n", + " fields: [\n", + " Field {\n", + " name: \"description\",\n", + " lookup_key: Simple {\n", + " key: \"description\",\n", + " py_key: Py(\n", + " 0x00007fb27211f530,\n", + " ),\n", + " path: LookupPath(\n", + " [\n", + " S(\n", + " \"description\",\n", + " Py(\n", + " 0x00007fb27211f0b0,\n", + " ),\n", + " ),\n", + " ],\n", + " ),\n", + " },\n", + " name_py: Py(\n", + " 0x00007fb28e6cca30,\n", + " ),\n", + " validator: Str(\n", + " StrValidator {\n", + " strict: false,\n", + " coerce_numbers_to_str: false,\n", + " },\n", + " ),\n", + " frozen: false,\n", + " },\n", + " Field {\n", + " name: \"nested\",\n", + " lookup_key: Simple {\n", + " key: \"nested\",\n", + " py_key: Py(\n", + " 0x00007fb2723de040,\n", + " ),\n", + " path: LookupPath(\n", + " [\n", + " S(\n", + " \"nested\",\n", + " Py(\n", + " 0x00007fb2723de010,\n", + " ),\n", + " ),\n", + " ],\n", + " ),\n", + " },\n", + " name_py: Py(\n", + " 0x00007fb28e9091d0,\n", + " ),\n", + " validator: Model(\n", + " ModelValidator {\n", + " revalidate: Never,\n", + " validator: ModelFields(\n", + " ModelFieldsValidator {\n", + " fields: [\n", + " Field {\n", + " name: \"name\",\n", + " lookup_key: Simple {\n", + " key: \"name\",\n", + " py_key: Py(\n", + " 0x00007fb2723ddf80,\n", + " ),\n", + " path: LookupPath(\n", + " [\n", + " S(\n", + " \"name\",\n", + " Py(\n", + " 0x00007fb2723ddfb0,\n", + " ),\n", + " ),\n", + " ],\n", + " ),\n", + " },\n", + " name_py: Py(\n", + " 0x00007fb28f84f370,\n", + " ),\n", + " validator: Str(\n", + " StrValidator {\n", + " strict: false,\n", + " coerce_numbers_to_str: false,\n", + " },\n", + " ),\n", + " frozen: false,\n", + " },\n", + " Field {\n", + " name: \"age\",\n", + " lookup_key: Simple {\n", + " key: \"age\",\n", + " py_key: Py(\n", + " 0x00007fb2723de0d0,\n", + " ),\n", + " path: LookupPath(\n", + " [\n", + " S(\n", + " \"age\",\n", + " Py(\n", + " 0x00007fb2723de0a0,\n", + " ),\n", + " ),\n", + " ],\n", + " ),\n", + " },\n", + " name_py: Py(\n", + " 0x00007fb272a81890,\n", + " ),\n", + " validator: FunctionAfter(\n", + " FunctionAfterValidator {\n", + " validator: ConstrainedInt(\n", + " ConstrainedIntValidator {\n", + " strict: false,\n", + " multiple_of: None,\n", + " le: Some(\n", + " I64(\n", + " 80,\n", + " ),\n", + " ),\n", + " lt: None,\n", + " ge: Some(\n", + " I64(\n", + " 0,\n", + " ),\n", + " ),\n", + " gt: None,\n", + " },\n", + " ),\n", + " func: Py(\n", + " 0x00007fb27211ff40,\n", + " ),\n", + " config: Py(\n", + " 0x00007fb27212aac0,\n", + " ),\n", + " name: \"function-after[check_age(), constrained-int]\",\n", + " field_name: None,\n", + " info_arg: false,\n", + " },\n", + " ),\n", + " frozen: false,\n", + " },\n", + " ],\n", + " model_name: \"Nested\",\n", + " extra_behavior: Ignore,\n", + " extras_validator: None,\n", + " strict: false,\n", + " from_attributes: false,\n", + " loc_by_alias: true,\n", + " },\n", + " ),\n", + " class: Py(\n", + " 0x000055fe82a8e5f0,\n", + " ),\n", + " post_init: None,\n", + " frozen: false,\n", + " custom_init: false,\n", + " root_model: false,\n", + " undefined: Py(\n", + " 0x00007fb27350d290,\n", + " ),\n", + " name: \"Nested\",\n", + " },\n", + " ),\n", + " frozen: false,\n", + " },\n", + " ],\n", + " model_name: \"Root\",\n", + " extra_behavior: Ignore,\n", + " extras_validator: None,\n", + " strict: false,\n", + " from_attributes: false,\n", + " loc_by_alias: true,\n", + " },\n", + " ),\n", + " class: Py(\n", + " 0x000055fe829a26d0,\n", + " ),\n", + " post_init: None,\n", + " frozen: false,\n", + " custom_init: false,\n", + " root_model: false,\n", + " undefined: Py(\n", + " 0x00007fb27350d290,\n", + " ),\n", + " name: \"Root\",\n", + " },\n", + "), definitions=[], cache_strings=True), '__pydantic_serializer__': SchemaSerializer(serializer=Model(\n", + " ModelSerializer {\n", + " class: Py(\n", + " 0x000055fe829a26d0,\n", + " ),\n", + " serializer: Fields(\n", + " GeneralFieldsSerializer {\n", + " fields: {\n", + " \"nested\": SerField {\n", + " key_py: Py(\n", + " 0x00007fb28e9091d0,\n", + " ),\n", + " alias: None,\n", + " alias_py: None,\n", + " serializer: Some(\n", + " Model(\n", + " ModelSerializer {\n", + " class: Py(\n", + " 0x000055fe82a8e5f0,\n", + " ),\n", + " serializer: Fields(\n", + " GeneralFieldsSerializer {\n", + " fields: {\n", + " \"name\": SerField {\n", + " key_py: Py(\n", + " 0x00007fb28f84f370,\n", + " ),\n", + " alias: None,\n", + " alias_py: None,\n", + " serializer: Some(\n", + " Str(\n", + " StrSerializer,\n", + " ),\n", + " ),\n", + " required: true,\n", + " },\n", + " \"age\": SerField {\n", + " key_py: Py(\n", + " 0x00007fb272a81890,\n", + " ),\n", + " alias: None,\n", + " alias_py: None,\n", + " serializer: Some(\n", + " Int(\n", + " IntSerializer,\n", + " ),\n", + " ),\n", + " required: true,\n", + " },\n", + " },\n", + " computed_fields: Some(\n", + " ComputedFields(\n", + " [],\n", + " ),\n", + " ),\n", + " mode: SimpleDict,\n", + " extra_serializer: None,\n", + " filter: SchemaFilter {\n", + " include: None,\n", + " exclude: None,\n", + " },\n", + " required_fields: 2,\n", + " },\n", + " ),\n", + " has_extra: false,\n", + " root_model: false,\n", + " name: \"Nested\",\n", + " },\n", + " ),\n", + " ),\n", + " required: true,\n", + " },\n", + " \"description\": SerField {\n", + " key_py: Py(\n", + " 0x00007fb28e6cca30,\n", + " ),\n", + " alias: None,\n", + " alias_py: None,\n", + " serializer: Some(\n", + " Str(\n", + " StrSerializer,\n", + " ),\n", + " ),\n", + " required: true,\n", + " },\n", + " },\n", + " computed_fields: Some(\n", + " ComputedFields(\n", + " [],\n", + " ),\n", + " ),\n", + " mode: SimpleDict,\n", + " extra_serializer: None,\n", + " filter: SchemaFilter {\n", + " include: None,\n", + " exclude: None,\n", + " },\n", + " required_fields: 2,\n", + " },\n", + " ),\n", + " has_extra: false,\n", + " root_model: false,\n", + " name: \"Root\",\n", + " },\n", + "), definitions=[]), '__signature__': , 'model_computed_fields': {}}\n" + ] + } + ], + "source": [ + "whole_dict_of_model = Root.__dict__\n", + "logging.info(whole_dict_of_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### What do we get out of this?\n", + "\n", + "First of all, we can see that the `__dict__` attribute of the model class contains all the fields that we defined in the model,\n", + "**together** with definitions of validator functions linked to named fields. This means, that we can try to **programatically**\n", + "create a class inheriting from `BaseModel` and add all the fields and validators to it by accessing correct, private attributes of the model instance.\n", + "\n", + "But why any JSON dumps of our model did not contain this information? The answer is simple - Pydantic does not serialize the validators,\n", + "because the underlying serializers do not know how to handle them. They are **functions** with specific **closures** that need to be\n", + "**reconstructed** in order to be used. Similarly, reading the documentation for JSON schemas in Pydantic, we can see that there\n", + "is no straightforward way to serialize models and load them back via library's API.\n", + "\n", + "Let's check if we can reconstruct our nested model from the serialized form of the `__dict__` attribute." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ERROR:root:Failed to reconstruct the model: A non-annotated attribute was detected: `model_fields = {'description': FieldInfo(annotation=str, required=True), 'nested': FieldInfo(annotation=Nested, required=True)}`. All model fields require a type annotation; if `model_fields` is not meant to be a field, you may be able to resolve this error by annotating it as a `ClassVar` or updating `model_config['ignored_types']`.\n", + "\n", + "For further information visit https://errors.pydantic.dev/2.7/u/model-field-missing-annotation\n", + "INFO:root:{'description': 'A nested model', 'nested': {'name': 'John', 'age': 18}}\n" + ] + } + ], + "source": [ + "model_dump = dict(**Root.__dict__)\n", + "\n", + "# Start with the built-in type function to create a new class\n", + "try:\n", + " reconstructed_model = type(\n", + " 'NestedModel',\n", + " (pydantic.BaseModel,),\n", + " model_dump\n", + " )\n", + "except pydantic.PydanticUserError as error:\n", + " logging.error(f'Failed to reconstruct the model: {error}')\n", + "\n", + "# Maybe we can try to filter out the object's attributes from the model's dictionary,\n", + "# and then try to reconstruct the model using Pydantic API?\n", + "\n", + "object_dict = object.__dict__ # This is the dictionary of the object's attributes, a base class for all objects in Python\n", + "filtered_model_dump = {k: model_dump[k] for k in model_dump if k not in object_dict} # This leaves out Pydantic's BaseModel-specific attributes\n", + "\n", + "validators_from_dict = filtered_model_dump.get('__validators__', {})\n", + "reconstructed_model = pydantic.create_model(\n", + " 'NestedModel',\n", + " __base__=pydantic.BaseModel,\n", + " __validators__=validators_from_dict,\n", + " **{\n", + " annotation: (filtered_model_dump['__annotations__'][annotation], ...)\n", + " for annotation in filtered_model_dump['__annotations__']\n", + " }\n", + ")\n", + "reconstructed_model_instance = reconstructed_model(\n", + " description='A nested model',\n", + " nested=Nested(\n", + " name='John',\n", + " age=18\n", + " )\n", + ")\n", + "\n", + "logging.info(reconstructed_model_instance.dict())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Well, seems like we have found out way to reconstruct the model. It may no be as straightforward as we would like it to be, but it clearly works.\n", + "The validators seem to be taken into the account, since the validation of the nested model works as expected. But, what if we start from scratch,\n", + "meaning that there is no `Model` class defined in the current environment?" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ERROR:root:Failed to create an instance of the model: name 'Nested' is not defined\n" + ] + } + ], + "source": [ + "if 'Nested' in globals():\n", + " del globals()['Nested']\n", + "if 'Root' in globals():\n", + " del globals()['Root']\n", + "\n", + "# Now we can try to reconstruct the Root model, having ONLY the filtered dictionary available to us\n", + "\n", + "reconstructed_model = pydantic.create_model(\n", + " 'NestedModel',\n", + " __base__=pydantic.BaseModel,\n", + " __validators__=validators_from_dict,\n", + " **{\n", + " annotation: (filtered_model_dump['__annotations__'][annotation], ...)\n", + " for annotation in filtered_model_dump['__annotations__']\n", + " }\n", + ")\n", + "\n", + "try:\n", + " reconstructed_model_instance = reconstructed_model(\n", + " description='A nested model',\n", + " nested=Nested(\n", + " name='John',\n", + " age=18\n", + " )\n", + " )\n", + "except NameError as e:\n", + " logging.error(f'Failed to create an instance of the model: {e}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And here is the main pitfall we encounter in this case. The `Model` class is not defined in the current environment, so we cannot\n", + "reconstruct the model from the serialized form of the `__dict__` attribute. This means that we need to have the `Model` class\n", + "defined in the current environment in order to reconstruct the `Root` model from the serialized form of the `__dict__` attribute.\n", + "\n", + "The second pitfall is actually easy to show - if we choose the output (even filtered) of the `__dict__` attribute of the model instance\n", + "then we will be unable to serialize it to a form that would be suitable for exporting such as JSON or YAML. This is because the\n", + "`__dict__` attribute contains references to the functions that are not serializable.\n", + "\n", + "Let's quickly define another nested model and try to serialize it to see if we can reconstruct it from the serialized form." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "class Address(pydantic.BaseModel):\n", + " street: str\n", + " city: str\n", + " zip: str\n", + "\n", + " @pydantic.field_validator('zip')\n", + " def check_zip(cls, value):\n", + " if len(value) != 5:\n", + " raise ValueError('ZIP code must be exactly 5 characters long.')\n", + " return value\n", + "\n", + "class WorkInfo(pydantic.BaseModel):\n", + " company: str\n", + " position: str\n", + " salary: float = pydantic.Field(ge=0)\n", + "\n", + "# This will be our new Root model\n", + "class Person(pydantic.BaseModel):\n", + " name: str\n", + " age: int = pydantic.Field(ge=0)\n", + " address: Address\n", + " occupation: WorkInfo" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is time to JSON dump this bad boi to see if it can be exported." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "ERROR:root:Failed to serialize the model: Object of type type is not JSON serializable\n", + "ERROR:root:Failed to pickle the model: cannot pickle 'getset_descriptor' object\n", + "INFO:root:Model pickled successfully!\n" + ] + } + ], + "source": [ + "filtered_person_dump = {k: Person.__dict__[k] for k in Person.__dict__ if k not in object_dict}\n", + "\n", + "# Dump the model to JSON\n", + "try:\n", + " person_as_json = json.dumps(filtered_person_dump, indent=2)\n", + "except TypeError as e:\n", + " logging.error(f'Failed to serialize the model: {e}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Okay, we can always `pickle` our data and go from there." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pickle\n", + "\n", + "# Let's pickle the model\n", + "try:\n", + " pickled_model = pickle.dumps(filtered_person_dump)\n", + "except TypeError as e:\n", + " logging.error(f'Failed to pickle the model: {e}')\n", + "\n", + "# Maybe the pickling of the whole model will work?\n", + "try:\n", + " pickled_model = pickle.dumps(Person)\n", + " logging.info('Model pickled successfully!')\n", + "except TypeError as e:\n", + " logging.error(f'Failed to pickle the model: {e}')\n", + "# Let's pickle the model\n", + "try:\n", + " pickled_model = pickle.dumps(filtered_person_dump)\n", + "except TypeError as e:\n", + " logging.error(f'Failed to pickle the model: {e}')\n", + "\n", + "# Maybe the pickling of the whole model will work?\n", + "try:\n", + " pickled_model = pickle.dumps(Person)\n", + " logging.info('Model pickled successfully!')\n", + "except TypeError as e:\n", + " logging.error(f'Failed to pickle the model: {e}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Yes! The anwser was so simple, we only needed to use the good old `pickle` module to serialize the model to a bytes object. Let's see how it looks like, so we can try to come up with a way to maintain the model in a serialized form." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:root:b'\\x80\\x04\\x95\\x17\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x8c\\x08__main__\\x94\\x8c\\x06Person\\x94\\x93\\x94.'\n" + ] + } + ], + "source": [ + "logging.info(pickled_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "Exactly, just how a random jumble of bytes would look like. More over, the **first** pitfall is still in place - we need to have nested models defined in the current environment in order to reconstruct the model from the serialized form." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "ename": "AttributeError", + "evalue": "Can't get attribute 'Person' on ", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[20], line 7\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[38;5;66;03m# Now we will try to unpickle the model\u001b[39;00m\n\u001b[1;32m 6\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m----> 7\u001b[0m unpickled_model \u001b[38;5;241m=\u001b[39m \u001b[43mpickle\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mloads\u001b[49m\u001b[43m(\u001b[49m\u001b[43mpickled_model\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 8\u001b[0m logging\u001b[38;5;241m.\u001b[39minfo(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mModel unpickled successfully!\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m 9\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mTypeError\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n", + "\u001b[0;31mAttributeError\u001b[0m: Can't get attribute 'Person' on " + ] + } + ], + "source": [ + "for model in [Person, Address, WorkInfo]:\n", + " if model.__name__ in globals():\n", + " del globals()[model.__name__]\n", + "\n", + "# Now we will try to unpickle the model\n", + "try:\n", + " unpickled_model = pickle.loads(pickled_model)\n", + " logging.info('Model unpickled successfully!')\n", + "except TypeError as e:\n", + " logging.error(f'Failed to unpickle the model: {e}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## A need" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These ramblings show that available solutions always come with some kind of trade-off. We can either:\n", + "\n", + "* serialize the model to a dictionary and lose the validators,\n", + "* serialize the model to a bytes object and lose the ability easily analyze the model's structure.\n", + "\n", + "In both cases, we need to be **aware** that any nested models need to be defined in the current environment in order to reconstruct the model from the serialized form.\n", + "So, if You model is dependent on some other models - You've got two pickles to pass around, and the complexity of the model grows with each nested model.\n", + "\n", + "This is a clear sign of a need a better way to serialize and deserialize Pydantic models, so we can easily export and import them to and from different environments.\n", + "\n", + "Back to the drawing board..." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/src/content/articles/_reflections.mdx b/src/content/articles/_reflections.mdx index ea04ed22..ac1f51da 100644 --- a/src/content/articles/_reflections.mdx +++ b/src/content/articles/_reflections.mdx @@ -13,8 +13,55 @@ image: Multi-paradigm programming languages can sometimes be thought of as funny social experiment that tests the boundaries of human tendencies to hack around problems that they have created themselves. Throw in also no static typing and you -obtain a perfect whirlwind of chaos from which both masterpieces and disasters can emerge. Take a look at Python, -which over time absorbed more and more features that allow us to approach a multitude of tasks in a multitude of ways. +obtain a perfect whirlwind of chaos from which both masterpieces and disasters can emerge. + +The complete singularity of entropy happens when You can leverage aspects of metaprogramming +due to the interpretative nature of the language. Not knowing for 100% the behavior of the code at runtime +can be a little bit scary but when proper measures are taken - such as isolating the closures of dynamically +compiled code - it can be a powerful tool to create flexible frameworks that can adapt to the data they are fed. + +Python checks all of those marks and, given the fact that it is my main language used during professional work, +I have decided to take a stab at using reflections and introspection to create a dynamic, self-modifying code. +This being said, for projects like this, You first need to come up with a problem, since You know the "tools" +(or in this case techniques) that You want to use. So ... + +## Wild problem appears! 🦄 + +Kubernetes. I know, seems random, but one of the main things that I like about how objects are +defined and maintained within it is the declarative nature of the configuration files. You start with a **manifest** +that says which resources are to be present in Your cluster in the form of YAML files and then You apply them. + +Today, a majority of technological stacks that are based on Kubernetes architecture are using Helm charts +to manage the deployment of applications. In very short, it allows us to define dynamically spawned **objects** according to +some **template** that is then populated by **values** that must adhere to the resource **schema**. These collections of templates +can be then **versioned** as **charts** and maintained in a **repository** by developers. + +Here, the pattern that I want to shamelessly copy and advertise as a burning issue in my Python tooling is +some kind of module/tool that allows to define **schemas** for data structures such as configurations files, +data models, by use of **YAML manifests** (or any other format that can be easily parsed) that can be easily +**versioned** and **maintained** in a **repository**. This hypothetical tool should be able to **generate** Python +**classes** that can be used to create instances of those data structures and **validate** the input data against +those pre-defined rules. + +### Which is Pydantic, right? 🤔 + +Yes and no. Pydantic is a great library that allows You to define data models in Python and validate them against +the input data. It also allows You to serialize and deserialize those models to and from JSON. + +However, these JSON schemas do not contain information about one of the most flexible aspects of data validation - the **custom validators**, +which can be defined **programmatically** in the Python code. So that is one of the issues that I want to tackle. +This can be easily seen in the following example: + + +Also, while on the topic of allowing custom validators, I want their definitions to be as **flexible** as possible, +while remembering that somebody can use a cheeky `eval` to inject some malicious code into the system or +`shutil.rmtree` to delete the whole filesystem. + +Also, when You dump a Pydantic model to JSON, the schema is often not very human-readable and it is not easy to +maintain it in a repository. I want to structure my schema definitions in such a way, that a maintainer can easily +see which fields are required, which are optional, which have default values, which are of a specific type and which +are basically nested schemas themselves. Also, YAML has a couple of nice tricks up its sleeve such as **anchors** and +**references** that can be used to define a schema in a more [DRY way]. {/* */} + +[DRY way]: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself