Skip to content
This repository has been archived by the owner on Aug 13, 2018. It is now read-only.

Apply dTRAM to REMD data #48

Open
daviddaileo opened this issue Aug 21, 2015 · 3 comments
Open

Apply dTRAM to REMD data #48

daviddaileo opened this issue Aug 21, 2015 · 3 comments

Comments

@daviddaileo
Copy link

I tested out dtram.py for long MD data at different thermodynamic states and it works very well. May I ask how to apply dtram.py to REMD data since the input simulation files have no way to tell the code that the samples consist of many short trajectories? Your help will be greatly appreciated.

@franknoe
Copy link
Contributor

I think Christoph should answer how to do that technically. I just have
three general comments:

  1. For REMD or generally multi-temperature simulations the problem is
    that the bias energies or reweighting factors depend on the
    instantaneous potential energy. Doing this with discretization-based
    schemes such as dTRAM or WHAM requires you to discretize the potential
    energy scale of the system finer than kT, which is often practically
    infeasible because combined with the discretization used in the
    configuration space this would create a ridiculous number of states. If
    you are working under the global equilibrium assumption, the solution is
    to replace WHAM by bin-less WHAM or MBAR and thus weigh each single
    configuration sampled instead of working with histograms. Currently the
    only TRAM solution in the published code is xTRAM, but I have recently
    run into a problem with the initialization, which I'm going to look at
    hopefully soon, so use this with care. We have a general and
    statistically TRAM method coming up, and I hope that manuscript + code
    will be available within one or two months.
  2. If you are using dTRAM through the command-line interface you can
    indicate the thermodynamic state and the configuration state of each
    sampled configuration. If you read the data by temperature you have to
    split them in short contiguous pieces, each getting the thermodynamic
    state index of the corresponding temperature index (e.g. 0000 0000
    0000 for pieces from the lowest temperature), in order to avoid counting
    unphysical transitions between different replicas. If you read the data
    by replica, this will be taken care of automatically even if you give
    the input in long trajectories because switches in the thermodynamic
    state index will mark the unphysical transitions (e.g. 0000 1111 0000).
  3. There is a case that you cannot solve by TRAM, namely if you save
    data less frequently than you exchange. In this case you don't know what
    is a physical transition at the same thermodynamic state and what isn't,
    because this information is usually not stored (you could be in
    thermodynamic state 0 in two subsequently stored time-points, but you
    might have actually transited through thermodynamic state 1 in between).
    We have developed a hybrid TRAM/MBAR method for dealing with this
    situation, but again code and manuscript are still coming up.

Am 22/08/15 um 01:00 schrieb Wei Dai:

I tested out dtram.py for long MD data at different thermodynamic
states and it works very well. May I ask how to apply dtram.py to REMD
data since the input simulation files have no way to tell the code
that the samples consist of many short trajectories? Your help will be
greatly appreciated.


Reply to this email directly or view it on GitHub
#48.


Prof. Dr. Frank Noe
Head of Computational Molecular Biology group
Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354
Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

@daviddaileo
Copy link
Author

Thank you very much Frank! Your comments are very helpful. And I managed to make the dTRAM python code work for the REMD data. Then, I compared the results with the dTRAM c++ code that I wrote. They are kinda close but the difference is noticeable. I guess it could be due to some technical implementation, such as the choice of prior count (or sudo count), or how to deal with the NaN or inf results in the process of iteration.
As you pointed out in your comment #1, I did use UWHAM for the REMD data before. But UWHAM did not utilize the kinetic information at all. So I hope that dTRAM could tell me something more about the kinetics of the system even though I paid the price of less accurate thermodynamic information since the bias is treated the same for all samples in the same bin.
I do look forward to your general TRAM method! Thank you again.

@fabian-paul
Copy link
Member

Hi @daviddaileo, thanks for testing our code!
I guess we didn't clearify in our publication how to deal with zero counts and zero lagrange multipiliers?
@daviddaileo: I you are getting infinities or NaN, there is something wrong. dTRAM is similar to estimating a MSM with a fixed stationary vector. We have discussed some numerical issues that have to be taken into account in our paper http://arxiv.org/pdf/1507.05990v1.pdf, Section III.C.3.
Is this covered by your C++ implementation?
I think also the set of MSM states must be restricted to the largest strongly connected component of the projected count matrix \sum_k C^{k}_{ij} where k runs over the thermodynamic states. Is this secured in your application?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants