Skip to content
This repository has been archived by the owner on Jun 7, 2024. It is now read-only.

Polycyclics thoughts: on-the-fly vs group additivity #273

Open
nickvandewiele opened this issue Nov 5, 2012 · 9 comments
Open

Polycyclics thoughts: on-the-fly vs group additivity #273

nickvandewiele opened this issue Nov 5, 2012 · 9 comments

Comments

@nickvandewiele
Copy link
Contributor

Just a post to share my experiences with thermochemistry for polycyclics.

Greg implemented the semi-empirical PM3 , molecular mechanics MM4 features. I extended Benson for polycyclics by creating a new polycyclic ring correction library. Both were fueled by the sense that something was very wrong with the thermo for polycyclic molecules.

Both have weaknesses and strengths:

  • my polycyclics library has more than 100 entries, mostly based on computed or experimentally determined ring strain energies, and thus is very accurate for the enthalpy correction. S, Cp corrections for these polycyclics are far less reported, because they are not as easily determined as strain energies. Thus, i copied the S, Cp correction from the library entry that closely resembled the given polycyclic ring system. (e.g. cyclopentane is used for norbornane S, Cp correction)
  • PM3: while S, Cp values for polycyclic compounds seem to be always in the right ballpark, enthalpies sometimes do not. I have seen errors of up to 10 kcal/ mol for some molecules compared to my best estimates. The error is not systematic in the sense that many typical well-known compounds have quite accurate values. The tendency is that PM3 underestimates the enthalpy rather than overestimates it. My feeling is that PM3 sometimes goes wrong with crazy structures, created by the execution of e.g. intra_addition_exocyclic reaction family, where a cycle is created in molecules that already had some cycles.
  • the PM3 implementation is very robust; RMG never hangs on a single PM3 calculation and always comes up with "something" Murat Keceli observed that PM3 sometimes gives different results for identical RMG simulations, suggesting that the PM3 calc does not always perform the same number of iterations. This resulted in significant differences.
  • MM4 is an implementation, that despite its promised accuracy gain over PM3, i seldomly use. The reason is that MM4 tends to "hang" on particular molecules and completely interrupts RMG therefore. Manually killing the 1st attempt leads to the 2nd attempt (with different keywords), 3rd attempt, but these new keywords do not seem to circumvent the problem.
  • MM4 not only suffers from incidental hanging, it also "silently" continues with crazy values. E.g.:
Could not find a non trivial ring correction!Trying QMTP...
*****Warning: zero freqencies found (values lower than 7.7 cm^-1 are rounded to zero in MM4 output); CanTherm should hopefully correct this:
Pre-existing successful MM4 result for REXBDNBUYGXXMZ-UHFFFAOYAU (InChI=1/C8H12/c1-2-5-8-6-3-4-7-8/h2H,3-4,6-7H2,1H3) has been found. This log file will be used.
Thermo for REXBDNBUYGXXMZ-UHFFFAOYAU: 323.25    129.02  36.22   46.55   55.81   63.64   75.79   84.59   97.81   
Thermo for REXBDNBUYGXXMZ-UHFFFAOYAU_RRHO: 323.25   129.02  36.22   46.55   55.81   63.64   75.79   84.59   97.81   
Created new species: SPC(4304) InChI=1/C8H12/c1-2-5-8-6-3-4-7-8/h2H,3-4,6-7H2,1H3
ChemFormula: C8H12
1  H 0 {7,S}
2  C 0 {3,S} {9,S} {10,S} {11,S}
3  C 0 {2,S} {4,S} {12,S} {13,S}
4  C 0 {5,D} {3,S} {8,S}
5  C 0 {4,D} {6,D}
6  C 0 {5,D} {7,S} {14,S}
7  C 0 {6,S} {1,S} {15,S} {16,S}
8  C 0 {4,S} {9,S} {17,S} {18,S}
9  C 0 {2,S} {8,S} {19,S} {20,S}
10  H 0 {2,S}
11  H 0 {2,S}
12  H 0 {3,S}
13  H 0 {3,S}
14  H 0 {6,S}
15  H 0 {7,S}
16  H 0 {7,S}
17  H 0 {8,S}
18  H 0 {8,S}
19  H 0 {9,S}
20  H 0 {9,S}

The number of resonance isomers is 0
The NASA data is 
!Group:Cs-CsCsHH    Group:Cs-(Cds-Cdd-Cd)CsHH   Group:Cds-(Cdd-Cd)CsCs  Group:Cdd-CdsCds    Group:Cds-(Cdd-Cd)CsH   Group:Cs-(Cds-Cdd-Cd)HHH    Group:Cs-(Cds-Cdd-Cd)CsHH   Group:Cs-CsCsHH 
!MM4 calculation with CanTherm analysis
! [_ SMILES="InChI=1/C8H12/c1-2-5-8-6-3-4-7-8/h2H,3-4,6-7H2,1H3" _]
SPC(168)                C   8H  12          G   250.000  5000.000   995.043    1
 1.71088574E+01 3.64686093E-02-1.30374883E-05 2.10165108E-09-1.25939495E-13    2
 1.54107129E+05-4.74289947E+01 3.22989036E+00 3.56896401E-02 7.34166570E-05    3
-1.12958069E-07 4.31375788E-11 1.59669759E+05 3.35327699E+01                   4

ThermoData is 
323.25  129.02  36.22   46.55   55.81   63.64   75.79   84.59   97.81   

For some reason, unknown to me, the enthalpy by MM4 ends up being 323 kcal/mol while 19 kcal/ mol is a reasonable estimate.

In conclusion:
I am thinking of creating a new thermochemistry for polycyclics strategy where enthalpies are preferrably based on polcycylic library corrections while S, and Cps are calculated on-the-fly, and thus benefitting from the strengths of both approaches.

this should definitely improve the accuracy but will lead to longer simulation times, compared to the current strategy, where on-the-fly calcs are only done on species that did not have library entries.

Let me know what you think about this.

@keceli
Copy link
Contributor

keceli commented Nov 5, 2012

Here is a sample Mopac file that can give two different outputs based on the machine it was run.

pm3  precise nosym gnorm=0.0 bfgs
InChI=1/C8H10/c1-3-8-6-4-5-7(8)2/h3-4,6H,1,5H2,2H3

C   1.65870 1 -0.67670 1  0.19900 1
C   0.42230 1  0.13240 1  0.20490 1
C   0.33840 1  1.58310 1  0.31480 1
C  -0.93230 1  1.94420 1  0.28390 1
C  -1.82610 1  0.75820 1  0.14920 1
C  -0.81740 1 -0.34660 1  0.10950 1
C  -1.19930 1 -1.78890 1 -0.02140 1
C   2.88300 1 -0.16430 1  0.29630 1
H  -0.84030 1 -2.36120 1  0.86350 1
H   3.74640 1 -0.82220 1  0.28490 1
H   1.57790 1 -1.75620 1  0.11060 1
H   1.16210 1  2.28270 1  0.40800 1
H  -2.30490 1 -1.90590 1 -0.08210 1
H  -1.27360 1  2.97230 1  0.34860 1
H  -0.75590 1 -2.21910 1 -0.94760 1
H  -2.49530 1  0.66210 1  1.03420 1
H   3.06650 1  0.90050 1  0.38840 1
H  -2.41000 1  0.80580 1 -0.79800 1

oldgeo thermo nosym precise pm3

RMG generates one of these thermo values based on Mopac output (where the difference originated):

Thermo for FBVAULMWQCXHBR-UHFFFAOYAF:
37.32 96.33 32.48 42.08 50.48 57.45 68.1 75.77 87.42

Thermo for FBVAULMWQCXHBR-UHFFFAOYAF:
37.11 90.41 32.48 42.11 50.52 57.5 68.16 75.83 87.47

There is no fault on RMG side for this problem. As Nick mentioned, this is related to the number of cycles in Mopac optimization step. In machines with different floating point handlings (e.g. 8 core nodes vs 48 core nodes on pharos) you can observe that one can achieve convergence earlier than other.

Possible solutions are either recompiling Mopac (or maybe updating to a newer version) with flags that ensure numerics are independent of the machine (Ray's suggestion) or setting a tighter convergence (change first line as: pm3 let precise nosym gnorm=0.0001 SCFCRT=1.D-10) to minimize the difference.

If we can find an open source (with GPL or similar license) alternative to Mopac for PM3 calculations, that might be the best solution, since it gives the option to pack it with RMG.

@nickvandewiele
Copy link
Contributor Author

the interesting thing is also that significant differences in this example are limited to the entropy, and not to the heat capacity. any idea if this is the case for other examples too?

@jwallen
Copy link
Contributor

jwallen commented Nov 13, 2012

I'm trying to merge the polycyclics work into master for the 4.0 release, but when I test it I found a problem: monocyclics are not having any ring corrections applied (during a regular RMG job without QMTP). The log file is full of Could not find a non trivial ring correction! messages for the cyclic species. This cyclic species is one example; its enthalpy is off by ~6 kcal/mol and entropy by ~30 cal/mol*K from the value given on the master branch. Any suggestions, @nickvandewiele?

@nickvandewiele
Copy link
Contributor Author

I assume that the strategy you used to deal with cyclics is the one in which QMTP is explicitly not used. In case of a monocyclic species, RMG searches through Ring_Library/Dictionary/Tree.txt to find an apt ring correction. If there was not a single ring correction that fitted the description of the SSSR atoms, then it warns the user that thermo for that species did not contain any ring correction.

Previously, RMG would silently use the most generic ring correction, e.g. five-membered-ring (or which one did it use in this case). Currently, it does not add any correction, but instead warns the user about this. Next, the user would need to supply a more accurate thermo for that particular ring correction/species, or use the strategy that automatically switches to QMTP to provide thermo estimates.

So, the printed message in the log warns the user that he/she should revise the ring correction libraries. That was the behaviour i aimed at.

In conclusion, the discrepancies of 6 kcal/mol are presumably the correction for a generic_five_member_ring that is pointing to something like cyclopentane.

Does that make sense to you?

@jwallen
Copy link
Contributor

jwallen commented Nov 13, 2012

That makes sense. However, I think we should still use the generic ring corrections as before, even if they aren't a perfect match, while keeping the error message. (Maybe only print the error message for polycyclics, not monocyclics?) Otherwise people would need to run QMTP all the time, which I think is overkill for the model I am trying to build. The only significant cyclics in my model are monocyclic ethers formed from the low-temperature peroxy chemistry, and the generic ring corrections were good enough to get the major cyclic ethers correct without QMTP. Basically, I think that some ring corrections + warning is better than no ring corrections + warning.

At some point we may want to gather warnings such as these and print them at the end of the log file so they are easier to find (similar to the way LaTeX does), but that's for another issue.

@nickvandewiele
Copy link
Contributor Author

ok, fair enough. I understand your point. I think it's pretty easy to change to "corrections + warning". I'll look into where exactly this needs to be changed.

Of course, this strategy was based on my experiences with my systems of interest where crazy species such as C1=C=CCC1 kept on accumulating because they were as stable as "regular" cyclopentanes.

@connie
Copy link
Member

connie commented Nov 14, 2012

We were able to get the old functionality by removing lines 356-357 in ThermoGAGroupLibrary.java and replacing with an error warning message instead of returning null:
https://github.com/connie/RMG-Java/blob/master/source/RMG/jing/chem/ThermoGAGroupLibrary.java#L356

This will probably screw up the hybrid and qmtp schemes completely though. So we need to figure out how to maintain the functionality for hybrid and qmtp but still have generic ring strain corrections incorporated when it's BensonOnly.

@nickvandewiele
Copy link
Contributor Author

Without having tested anything:

why don't you change the value 1 to 0?
line 373:

if(deepest <= 1)

what this should do is take the L1 hit (e.g. L1: FiveMember) in Ring_Tree.txt as a value for the ThermoGAValue. Only if the ring hits L0 ("Ring"), then it will return null.

This seems to be less invasive than your idea.

@jwallen
Copy link
Contributor

jwallen commented Nov 21, 2012

I think that would have the same effect as my idea (fix the BensonTDGenerator but break the HybridTDGenerator) because I don't think we ever stop at the L0 node since there are L1 nodes for three-membered to ten-membered rings.

The challenge here seems to be that ThermoGAGroupLibrary.findRingCorrections() must return null in order to trigger the hybrid method, but must return non-null in order for the ring corrections to be applied in the Benson-only method. What we need is for this method to return both (1) the best available ring corrections for the Benson method and (2) a flag indicating the quality of the corrections so we can use the hybrid method. Naturally, Java doesn't allow for easily returning two things from a method. I'm toying with creating a simple new class (something like RingCorrections) that stores both pieces of information, but am more than happy to defer to anyone with a better idea.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants