-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Immutability and usability #9
Comments
Thanks for the write up, @mikibonacci! I think in the end the position of not having any methods that mutate the structure or its properties will not be maintainable (even though I semi-argued for it in this discussion). Two arguments are:
|
After useful discussions with @GeigerJ2 and @khsrali, we think that a I think, the user will complain more about the immutability of an unstored node, with respect to him doing something wrong in modifying a In conclusion, I think we should let the user change unstored StructureData nodes. We just document the rule that, after being stored, the node will be immutable. Of course this is already in the AiiDA documentation , however @GeigerJ2 was suggesting to have like a concise but complete page where we collect all these AiiDA rules, that sounds as a very good idea. Pinning @sphuber and @giovannipizzi for useful comments on this. |
As mentioned during the group meeting, it would be great to get some concrete code snippets that show were a mutable class would be significantly better compared to an immutable variant. We have had many discussions concerning this issue over the last ten years or so as a result of confusion arising from having data classes be mutable when unstored. I think that merits having strong evidence on the other side to counteract this. |
In principle, now that I think about it, I don't see much speaking against making sd_instance.append_atom(...)
sd_instance.initialize_onsites_hubbard(...) doing instead (similar to pandas dataframes): sd_instance = sd_instance.append_atom(...)
sd_instance = sd_instance.initialize_onsites_hubbard(...) seems like a very acceptable change to me. When preparing some structure, I would not care about the intermediate nodes. Just the final one that I pass as an input to the process, which becomes stored anyway, and doesn't have to be the same one as I started out with. On the other hand, as noted by @mbercx:
it's not feasible to be consistent for all AiiDA classes anyway. Then is it worth to apply this behavior only for the I'll go ahead and implement an immutable version of the current new |
Why do we even need this |
Alright, fair point. I think this ties in with the general question: If However, I'm thinking now of a use case that was very common in my PhD: Now, from this optimized slab, one wants to create an input There is also the general question of how to conserve (pass through) the Having an When starting from scratch, I'd also never generate my |
@sphuber Maybe I'm missing something, but honestly, I don't understand why it hurts to have |
The objection has nothing to do with the preservation of provenance.
The reason is exactly to ensure the interface is as user-friendly as possible and has as little surprising behavior as possible. Experience over the last decade has learned that having mutating methods on a data class often leads to suprising behavior for users and can easily lead to bugs. Note that this is specific to AiiDA, compared to other tools like ASE and pymatgen, exactly because AiiDA has this concept of immutability once stored. Take the structure = StructureData()
structure.append_atom(...)
structure.append_atom(...) Now the user loads an existing structure from the database and wants to add some atoms as per @GeigerJ2 's example: structure = load_node(...)
structure.append_atom(...)
structure.append_atom(...) only to find that his structure = load_node(...)
new_structure = structure.append_atom(...)
new_structure.append_atom(...) Of course the This is just one example of course, but the more mutating methods you add, the more edge-cases there will be. I admit that there will not be a perfect solution and there will be trade-offs. I just wanted to make sure we don't make the same mistake again without having really thought hard about it. The relatively short comment stating that you guys had decided that a mutable variant was better, with little supporting evidence of concrete examples, made me worried that the historic counter arguments were perhaps not sufficiently taken into account. And, with all due respect, the fact that you seemed to not even be aware of the main counter argument seems to lend credence to that suspicion. This is the reason why I asked for some concrete examples so we can evaluate the pros/cons of both solutions and actually compare. Finally, the argument
I don't find very convincing to be honest. The whole reason we are redesigning the |
I appreciate your explanation; this is a good discussion. In the example you are raising, I would still find it user-friendly to do: structure = load_node(..., copy=True) # returns unstored copy
structure.append_atom(..) I'll wait for Miki's example, as you requested. So we can continue discussion on pros/cons based on that. |
First thing to say, I agree that returning a new modified instance of the node every time we invoke the Second thing: trying to ModificationNotAllowed: ...the node is already stored. If you want to modify it, you should copy it via the node.clone() method, which returns an unstored copy of the node. This means that the user will know how to proceed (i.e. cloning the node). Apology for the long comment below. (1) Simple example: selective dynamics, starting from ASE Atoms objectLet's first consider the following initial conditions:
from ase import Atoms
ase_atoms = Atoms('N2', [(0, 0, 0), (1, 0, 0)])
ase_atoms.cell = [2,2,2]
ase_atoms.set_initial_charges([0,0.5])
from aiida_atomistic.data.structure import StructureData
structure = StructureData.from_ase(ase_atoms)
structure.attributes The attributes of the structure are: {'cell': [[2.0, 0.0, 0.0], [0.0, 2.0, 0.0], [0.0, 0.0, 2.0]],
'pbc1': False,
'pbc2': False,
'pbc3': False,
'sites': [
{'symbol': 'N',
'mass': 14.007,
'kind_name': 'N',
'position': (0.0, 0.0, 0.0),
'charge': 0.0,
'magnetization': 0.0},
{'symbol': 'N',
'mass': 14.007,
'kind_name': 'N',
'position': (1.0, 0.0, 0.0),
'charge': 0.5,
'magnetization': 0.0},
],
} Which is exactly the set of inputs that we should provide in the Let's say I want to keep fixed the first N atom. In the new Mutable version:structure.sites[0].selective_dynamics = (False, False, False) # no allowed movements in (x, y, z)
structure.attributes This gives: {'cell': [[2.0, 0.0, 0.0], [0.0, 2.0, 0.0], [0.0, 0.0, 2.0]],
...
'sites': [
{'symbol': 'N',
'mass': 14.007,
'kind_name': 'N',
'position': (0.0, 0.0, 0.0),
'charge': 0.0,
'magnetization': 0.0,
'selective_dynamics': (False,False,False)}, # <=== Added a new property in the first site!
...
],
} Let's say that I stored the structure in AiiDA, by using structure = load_node(stored_structure_pk)
structure.sites[0].selective_dynamics = (True, True, True) # allowed movements in (x, y, z) raises (or will raise, but this is how I see the API in the mutable version) ModificationNotAllowed: ...the node is already stored. If you want to modify it, you should copy it via the node.clone() method, which returns an unstored copy of the node. the same will happen if we try to invoke structure = load_node(stored_structure_pk).clone()
structure.sites[0].selective_dynamics = (True, True, True) # allowed movements in (x, y, z) This is slightly involved to me. But still acceptable. Immutable version:suppose we already initialized the structure This gives: new_structure_dict = structure.to_dict() # or just structure.serialize() in pydantic PR way
new_structure_dict["sites"][0]["selective_dynamics"] = (False, False, False)
new_structure = StructureData(**new_structure_dict) Now the user should deal with a dictionary, modify it, using it to initialize a new StructureData. I find this more involved from a user perspective than the mutable version of doing things. We let the user deal with a dictionary which somehow should have a 1-to-1 correspondence to the a crystal structure. (2) More critical example: adding the Hubbard property, then deleting one atom: how to reorganize data?Let's assume we have our StructureData initialized to 3 atoms and including the Hubbard property, which can be seen as a list like this: hubbard_list = [
(0, '3d', 0, '3d', 7.2362, (0, 0, 0), 'V'), # atom_index, atom_manifold, neighbour_index, neighbour_manifold, value, translation, hubbard_type
(0, '3d', 2, '2p', 0.2999, (-1, 0, -1), 'V'),
(0, '3d', 1, '2p', 0.2999, (0, 0, -1), 'V'),
(0, '3d', 1, '2p', 0.2999, (-1, 0, 0), 'V'),
(0, '3d', 2, '2p', 0.2999, (0, -1, -1), 'V'),
(0, '3d', 2, '2p', 0.2999, (-1, -1, 0), 'V'),
(0, '3d', 1, '2p', 0.2999, (0, -1, 0), 'V')
] and is stored in the structure.delete_atom(index = 2)
structure.hubbard which gives: [
(0, '3d', 0, '3d', 7.2362, (0, 0, 0), 'V'),
(0, '3d', 1, '2p', 0.2999, (0, 0, -1), 'V'),
(0, '3d', 1, '2p', 0.2999, (-1, 0, 0), 'V'),
(0, '3d', 1, '2p', 0.2999, (0, -1, 0), 'V')
] we see that, in the hubbard property, all the elements involving the third site (i.e. of index=2) are removed automatically. This thanks to the new_structure_dict = structure.to_dict() # or just structure.serialize() in pydantic PR way
new_structure_dict["sites"].pop(2) # remove the third site/atom
new_hubbard = [interaction for interaction in new_structure_dict["hubbard"] if (interaction[0]!=2 or interaction[2] != 2)]
new_structure_dict["hubbard"]=new_hubbard
new_structure = StructureData(**new_structure_dict) To do this, the user needs to know how the hubbard property is structured. The user working with hubbard will know for sure, but the user loading a structure from a database and then trying to just remove atoms may accidentally ignore the hubbard property and so encounter issues in the run (concrete cases being removing termination from ribbons, splitting heterostructures, or removing adsorbed from slabs). Comment:I am not sure we are comparing the same kind of user friendliness, where one is about not letting the user make mistakes, one is on letting the user make mistakes but easily access methods to do what he wants, avoiding dumping dictionaries and looping on them. I have the feeling that a very concise documentation on the behavior of AiiDA nodes can help if we go for mutability. Why for StructureData is so difficult to accept immutability?The thing that I see is that, at variance with other AiiDA Data type as structure_python_class = StructurePython.from_ase(ase_atoms)
structure_python_class.append_atom(symbol="N", position=(1,1,1),...)
structure = StructureData.from_python_structure(structure_python_class)
structure.store()
new_structure_to_be_modified = structure.to_structure_python_class() we can even think to let WorkChains and Calcjobs to automatically serialize I see this concept of having the python counterpart class also for other atomistic data types, like |
Thanks a lot for the detailed writeup @mikibonacci this is exactly the thing we need. Let me first start with your selective dynamics example. You mention that setter-methods, for example You claim the mutable version would provide for a better user experience, for example nested properties, such as selective dynamics can be accesses through properties: structure.sites[0].selective_dynamics = (True, True, True) # allowed movements in (x, y, z) and then when the node was stored, this would simply raise again. This is exactly where the problem lies though. How do you plan on having this raise a What probably would happen is that the line that changes the selective dynamics is executed without an exception, but the structure will not actually have been changed at all. And this happens without any warning or error. The user will be left scratching their head why the structure wasn't changed. This is exactly the confusing behavior that I was warning about. Now I do agree that if you limit the interface to provide no mutating methods, it probably does make it less straightforward to use. The example about selective dynamics, I don't find the most convincing though. The variant structure.sites[0].selective_dynamics = (True, True, True) is not all that much more straightforward than new_structure_dict["sites"][0]["selective_dynamics"] = (False, False, False) The Hubbard example is more convincing. I do agree that having a And was there not the concept of allowing custom properties? Is that still in the plans? And if so, how do you plan that the |
Thanks @sphuber for the reply. I see the point of modifying This is indeed the same problem you mentioned for the class Site:
def __init__(self, parent, **kwargs):
...
self.parent = parent # StructureData
...
@selective_dynamics.setter
def selective_dynamics(self, value):
if self.parent.is_stored:
raise ModificationNotAllowed
self._selective_dynamics = value
self.parent.base.attributes.set("sites", self.parent.sites) I see this providing the expected behavior for the user. Instead, for the About the Hubbard example, I think the user will provide exactly a list as I did; I followed this tutorial as written by @bastonero. Other methods (as you can see here) can be used to append/remove Hubbard parameters, but they still require a mutable Maybe, just providing a For what concerns the custom properties, I think we should just let the user know that we don't provide helper methods for them, as indeed there will be no strict schema. I guess this is fair and will not be a big problem, as the user that wants to store custom properties will usually also be "expert" one, and he will take care to modify itself the dictionary. |
Your
Surely this is then more evidence that this design is perhaps not the best, whether it is for mutable or immutable?
This is the part I don't get. If you do want to support custom properties, surely you would have to make sure that the methods that are supposed to coherently manage the internal state (e.g. Again, I see the point of wanting to make a class that provides useful methods to mutate the internal state, however, how useful can it really be if there is a ton of caveats. It would be like saying:
Anyway, I think the point is clear. Maybe this is still a more user-friendly solution than the immutable variant without properties/utility methods. It is clear there is no perfect solution and it is a question of tradeoffs. I would personally err on the safe side and keep the interface minimal and treat the class just as a data container, but if others think the mutable variant is better, that is fine as well. |
Hey guys, thank you for these great comments! I agree with @sphuber, that this really seems like we have to accept some tradeoffs eventually. To put some more of my thoughts out there for discussion:
structure.sites[0].selective_dynamics = (True, True, True) though. I'm wondering now how one would do that without the workaround? Possibly by returning a custom class (e.g.
Adding or removing a
|
Hi @GeigerJ2 and @sphuber , I finally got the time to go back again on StructureData. On the utility functionsAbout the utility functions proposed by @GeigerJ2 , I think this will be sub-optimal as then the user has to know which utility functions we provide (i.e. no tab completion), for example looking at a documentation. The fact of returning a new AiiDA node each time is something I don't like as then the user may have to write some code like this: structure_old = StructureData(...)
for new_atom in atoms:
structure_old = append_atom(structure_old, new_atom) at least this what I would write to append more than one atom in my structure. structure_old = StructureData(...)
structure_old = append_several_atoms(structure_old, atom_list=...)
structure_old = append_hubbard_property(structure_old, hubbard_list=...) Still on mutable non-AiiDA classesOn the other hand, having a python object containing such utilities as methods can provide the beloved tab completion and mutability of itself (it's not AiiDA). The user indeed will end up always using this mutable class, and it make sense in some way: the StructureData will be just the container and only used in the Calcjobs/WorkChains. Do you see some issue in this @sphuber , @GeigerJ2 ? ASE just releasedAnother option would be to contribute to ASE (they just released) and add methods there. Or even build a new class on top of it which just extend with the properties we need to add, like Hubbard. Bit on custom propertiesFor the custom properties, I don't see providing much support for modifying them consistently as indeed a custom property can be something new, e.g. some parameter concerning three sites together (like a plane in the cell?) and so we cannot know a priori how to deal with this. Comment:I think the solutions are then: 1 - immutability with detached utilities (so basically the immutability) |
Happy to join this hot discussion! 😆 Based on my experiences with WorkGraph (especially PythonJob and REST API), I would like to give my two cents.
|
@mikibonacci thanks for your the good example
Maybe this is a naive suggestion. But, to be fair, that could also be done if the structure is immutable, just need to do everything behind the scenes and return a new instance. We could either: a) Allow mutable when unstored,but then we should always raise new_structure = structure.delete_atom(index = 2, copy=True)
new_structure.hubbard and if they wrongly do structure.delete_atom(index = 2)
structure.hubbard This would raise: b) Keep it immutable,then return a new instance for all operations like this delete, append, etc.. new_structure = structure.delete_atom(index = 2)
new_structure.hubbard |
As design principle, we want to have the
StructureData
as immutable even if it is not stored. This is done in order to avoid the user to attempt to modify the stored node afterwards, as one usually would do in standard python.However, this blocks also flexibility on how we produce our
StructureData
instance as ready to be used in calculation. For example, it will be no more possible to use methods likeappend_atom()
for the unstored node, as well as theappend_hubbard_parameter
in case of theHubbard
property.To change/update the instance, the only possibility is to generate a new instance, and this is made easier by having the
to_dict
method, which returns a python dictionary with everything we need to initialise a new object.However, I think that then the user has to deal with a rather complex object:
(we can always dump this as an
AttributeDict
).The main difficulty here is that, to change a property, we have to go into this nested structure every time. Moreover, there are not method to change multiple properties at the same time (e.g., a python dictionary has no
remove_atom
method) - the user should implements its own function to do it.*I am thinking that maybe returning a different object, can make the modification easier for the user. *
For example, we can think to return an instance of the
Properties
class (property
attribute of the structure node), which actually contains all the information on the StructureData node. This contains the instances of the properties as attributes, so we can call methods of the properties to modify them if we need it.The modified
Properties
instance can be used to initialise a newStructureData
(consistency checks are done in the__init__
).One issue that I see is that right now the properties are connected to the
StructureData
instance.We can think on making the properties mutable only if they are not attached to any
StructureData
instance, i.e. if are only attributes of theProperties
instance:we can then even think to provide methods to append/remove atoms and properties:
It seems that in this way we are providing a new class, which is not AiiDA datatype. It seems that we have then to maintain two classes: actually we are somehow transferring all the modification methods (append, remove atoms) to the
Property
class.***I am really not sure this is the way to go, it is just an idea on how we can try to improve user experience. ***
In the end we can somehow achieve a similar thing (limited wrt the properties which are supported), by using ASE or Pymatgen combined with
StructureData.from/to_ase/pymatgen
.The text was updated successfully, but these errors were encountered: