-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mol2 parser within MoSDeF ecosystem #562
Conversation
Codecov Report
@@ Coverage Diff @@
## master #562 +/- ##
==========================================
+ Coverage 90.49% 90.65% +0.16%
==========================================
Files 55 57 +2
Lines 4410 4507 +97
==========================================
+ Hits 3991 4086 +95
- Misses 419 421 +2
Continue to review full report at Codecov.
|
Is this supposed to be a PR or an issue @CalCraven ? |
@justinGilmer |
* Make sure unyt values are input as Angstroms to proper conversion are done to mBuild * Throw an OSError if the filepath doesn't exit that is used for from_mol2
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
* read in box information from CRYSIN record type indicator * assume box lenghts in angstroms and box angles in degrees
for more information, see https://pre-commit.ci
… space at the end of the rti
for more information, see https://pre-commit.ci
…dentally start setting elements
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes along the way, but this is looking great @CalCraven !
gmso/formats/mol2.py
Outdated
Supported record type indicators include Atom, Bond, FF_PBC, and CRYSIN. | ||
""" | ||
supported_rti = { | ||
"@<TRIPOS>ATOM\n": load_top_sites, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In favor of this too. If the specification does not allow anything else, then this is probably fine. Can people add comments or anything in this file? If so, what umesh suggests will make it robust enough
* Add .strip() method to strip lines with RTI in order to increase robustness * Move a msg error statement for missing file path into conditional statement * Use with open as f instead of f=open(path,"r") * Add docstring for load_top_sites * Add information about skipping unsupported RTI * Use regex match for checking for warnings in test-mol2.py
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…ol2 file, last bits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. From what I see this should read in proteins, now. Very excited!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small question and once you resolve the merge conflicts, this should be g2g!
gmso/formats/mol2.py
Outdated
line = f.readline() | ||
while f: | ||
# check for header character in line | ||
if line.strip().startswith("@<TRIPOS>"): | ||
# if header character in line, send to a function that will direct it properly | ||
line = _parse_record_type_indicator( | ||
f, line, topology, site_type | ||
) | ||
elif line == "": | ||
# check for the end of file | ||
break | ||
else: | ||
# else, skip to next line | ||
line = f.readline() | ||
topology.update_topology() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason for the internal while loop?
line = f.readline() | |
while f: | |
# check for header character in line | |
if line.strip().startswith("@<TRIPOS>"): | |
# if header character in line, send to a function that will direct it properly | |
line = _parse_record_type_indicator( | |
f, line, topology, site_type | |
) | |
elif line == "": | |
# check for the end of file | |
break | |
else: | |
# else, skip to next line | |
line = f.readline() | |
topology.update_topology() | |
for line in f: | |
# check for header character in line | |
if line.strip().startswith("@<TRIPOS>"): | |
# if header character in line, send to a function that will direct it properly | |
line = _parse_record_type_indicator( | |
f, line, topology, site_type | |
) | |
elif line == "": | |
continue | |
else: | |
continue | |
topology.update_topology() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried my best to write something that could use a for loop instead of a while loop. It starts to get tricky again when it finds a header in the line. That passes it to the _parse_record_type_indicator
function, which reads through the rest of that heading. In the case of a record type indicator (rti) starting a new section without a space after the previous section, that line must be passed back to the parser in order to continue forwards and identify which rti you are dealing with. With a for loop, every time through you must read the next line. In the instance as discussed above:
"""
@ATOM
Atom information
@BOND
bond information
"""
The next line cannot be read automatically because it will already be read when looking to find the end of the ATOM rti section.
Does that make sense? There may be a way to accomplish this while still using a for loops, but I tried exactly what you suggested and I run into this error still.
… conventions of residue_number and residue_name
Co-authored-by: Umesh Timalsina <umesh.timalsina@vanderbilt.edu>
Co-authored-by: Umesh Timalsina <umesh.timalsina@vanderbilt.edu>
for more information, see https://pre-commit.ci
Co-authored-by: Umesh Timalsina <umesh.timalsina@vanderbilt.edu>
…r b to be a nonetype and not return TypeError
Overview
We have some support for mol2 file formats, but currently use the load_mol2 from mdtraj. We have run into some issues because of various assumptions they make in their readers, as well as the variety implemented in the TRIPOS mol2 file format. See issue #921 and #893. This is also a relevant thread for the rdkit package's issues with their own implementation PR#415.
Details
Current Issues to Address:
Compound
, mBuildLattice
, gmsoTopology
)Sources of mol2 file formats to act as references
(please note if any important ones are missing)
rdkit - (read/write)
TRIPOS Standard Record Type Indicators (RTI's) Source
Can be supported with the current state of MoSDeF
Atom – mBuild compound
Bond – mBuild compound
CRYSIN – mBuild lattice
FF_PBC – mBuild compound box
FFCON_DISTANCE – topology
FFCON_ANGLE – topology
FFCON_TORSION – topology
FFCON_RANGE – topology
FFCON_MULTI – topology
SUBSTRUCTURE – subtopology/separate compounds
Could be supported in the future
CENTROID – virtual sites
CENTER_OF_MASS – virtual sites
EXTENSION_POINT – virtual sites
LINE – virtual sites
LSPLANE – virtual sites
NORMAL – virtual sites
MOLECULE – mBuild compound (residue)
Probably will not support
ALT_TYPE
ANCHOR_ATOM
ASSOCIATED_ANNOTATION
COMMENT
DICT
DATA_FILE
RING_CLOSURE
ROTABLE_BOND
SEARCH_DIST – setting constraints
SEARCH_OPTIONS – Running a SYBYL sim
SET – Color schemes
U_FEAT - Unity support
UNITY_ATOM_ATTR - Unity support
UNITY_BOND_ATTR - Unity support
Sample Tripos File Outline Shortened from
@MOLECULE
1ABE.pdb
4651 4696 305 0 0
PROTEIN
AMBER99
@ATOM
1 N 18.3590 26.8690 52.9550 N.pl3 1 GLY2 -0.4157
2 CA 19.6350 26.6320 53.6710 C.3 1 GLY2 -0.0252
3 C 19.9620 27.8960 54.5040 C.2 1 GLY2 0.5973
4 O 19.2860 28.0330 55.5390 O.2 1 GLY2 -0.5679
5 N 20.9160 28.7080 54.0680 N.am 2 LEU3 -0.4157
6 CA 21.3040 29.9430 54.7530 C.3 2 LEU3 -0.0518
@BOND
1 1 4650 1
2 2 1 1
3 2 4638 1
4 2 4636 1
5 3 2 1
6 3 5 am
@SUBSTRUCTURE
1 GLY2 2 RESIDUE 4 A GLY 1 ROOT
2 LEU3 6 RESIDUE 4 A LEU 2
3 LYS4 14 RESIDUE 4 A LYS 2
4 LEU5 23 RESIDUE 4 A LEU 2
Sample mol2 file in mBuild
Betacristobalite