Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New atom selection mechanism #672

Draft
wants to merge 26 commits into
base: protos
Choose a base branch
from

Conversation

MBartkowiakSTFC
Copy link
Collaborator

Description of work
The intention behind this PR is to implement a mechanism of selecting atoms which is:

  1. Not attached to a specific trajectory file,
  2. Transferable,
  3. Easy to extend.

We can still discuss if the implementation in this PR meets the requirements.

This screenshot shows the carbon dioxide trajectory in which two out of three atoms have been selected in the molecules from the first half of the index range. This is not scientifically meaningful, but illustrates how the selection operations can be combined.
selection_helper

Fixes

  1. New selection functions have been implemented to replace the classes we have been using so far.
  2. The output of each selection function is a set of indices.
  3. Selections are applied sequentially, and can be combined using set operations (union, intersection, difference).
  4. ReusableSelection class stores a sequence of selection functions and uses JSON for saving and loading a sequence.
  5. ReusableSelection.select_in_trajectory method takes a Trajectory instance as input, applies the selection and returns the resulting set of atom indices.
  6. Analysis jobs save the JSON selection sequence as one of the job inputs, and an array of 0/1 values to mark the indices which got selected.

The JSON representation is saved as str:
saved_input
The indices are saved as a mask array:
saved_selection_array

To test
All tests must pass.

@MBartkowiakSTFC MBartkowiakSTFC marked this pull request as draft February 14, 2025 17:12
@ChiCheng45
Copy link
Collaborator

ChiCheng45 commented Feb 18, 2025

Looks good. Just one thing about the SMARTS pattern matching. In the RDKit molecule object we use the Chem.rdchem.BondType.UNSPECIFIED for all of the bondtypes. This means that most SMARTS patterns you'd think should work won't and that is why the SMARTS patterns we use are a bit unusual.

The correct thing would be to set all the correct bond types and atom properties but I think this might get difficult to get working for a general molecule e.g. setting single/double/aromatic bonds for a benzene or cyclooctatetraene or some big protein.

Comment on lines 51 to 53
VALID_SELECTION = "Valid selection"
USELESS_SELECTION = "Selection did not change. This operation is not needed."
MALFORMED_SELECTION = "This is not a valid JSON string."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making these an Enum


def current_steps(self) -> str:
result = {}
for row in range(self.rowCount()):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not iterate over the children directly?

Comment on lines +147 to +148
self.molecule_names = [str(x) for x in self.system._clusters.keys()]
self.labels = [str(x) for x in self.system._labels.keys()]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be:

Suggested change
self.molecule_names = [str(x) for x in self.system._clusters.keys()]
self.labels = [str(x) for x in self.system._labels.keys()]
self.molecule_names = list(map(str, self.system._clusters))
self.labels = list(map(str, self.system._labels))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will note you keep externally accessing these variables which are marked _private, suggest making them public.

Comment on lines +65 to +70
if self.mode_box.currentIndex() == 2:
return "difference"
elif self.mode_box.currentIndex() == 1:
return "intersection"
else:
return "union"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better as an Enum

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One complication here is that these values have to be JSON encoded and decoded (and remain human-readable when saved as a text string). I am sure that it can be done. Do you think it is worth the effort?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually referring to the 1, 2...
If you had:

class Modes(Enum):
    DIFFERENCE = auto()
    INTERSECTION = auto()
    UNION = auto()
...

        if self.mode_box.currentIndex() == Modes.DIFFERENCE:
            return "difference"
        elif self.mode_box.currentIndex() == Modes.INTERSECTION:
            return "intersection"
        elif self.mode_box.currentIndex() == Modes.UNION:
            return "union"

It removes magic numbers and ensures internal consistency a bit better if we ever add other operations.

As an aside, Enums can act as strings

self.selection_field.setPlaceholderText("0,1,2")
self.selection_keyword = "index_list"
self.selection_separator = ","
if new_mode == "range":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also probably Enum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants