Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Existing traj integration #45

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion adaptivemd/analysis/pyemma/emma.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,10 +172,26 @@ def execute(

ty = trajs[0].types[outtype]


engines = []
for traj in trajectories:
if traj.engine not in engines:
engines.append(traj.engine)

if len(engines) > 1:
trajs = []
for traj in trajectories:
trajs.append(os.path.join(traj.location, traj.types[outtype].filename))
trajectory_file_name = ''
else:
trajs = list(trajectories)
trajectory_file_name = ty.filename


t.call(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I see what this is necessary for: You want to use loaded and generated trajectories (so multiple engines) but assume that the generated trajectories have different names.

Your way is one way to do it and not change the _remote.py. I usually try to avoid hacks and use parameters in ways they are not supposed to (even if it works). That works now, but when someone changes the _remote.py and does not know your hack, it might fail. E.g. a (resonable) check that the outtype is not empty.

I am also not sure if we need to make more restrictions on the output types in a project: Like now you assume, that if I use protein in the analysis that the engine from all trajs have the same idea of protein. I think that makes sense, but we could check, if stride is the same and selection, too. Filename can be different of course.

Btw. getting list of unique Engine objects is easiest to get using set

engines = set(traj.engine for traj in trajectories)

What about just changing _remote to accept directly the full path, like in your multi engine approach, just not to pass the additional traj_name parameter. I think I did this exactly to not create a second list, but why not.

remote_analysis,
trajectories=trajs,
traj_name=ty.filename, # we need the filename in the traj folder
traj_name=trajectory_file_name, # we need the filename in the traj folder
selection=ty.selection, # tell pyemma the subsets of atoms
features=features,
topfile=input_pdb,
Expand Down
2 changes: 1 addition & 1 deletion adaptivemd/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -362,7 +362,7 @@ def replace_prefix(self, path):
path = path.replace('sandbox://', '../..')

# the main remote shared FS
path = path.replace('shared://', '../../..')
path = path.replace('shared://', '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a very bad hack, if not more a bug. I actually doubt that this will work. What it does it that everywhere, where you expect from the working directory to link to NO_BACKUP you will end up in the working directory instead. Possible that it still works because it is not used yet.

You can use worker:// instead because that is actually what worker does. Just don't alter the path and if you use an absolute path then this works.

worker:///this/is/an/absolute/path/traj.dcd`

note the 3! / in the beginning, while relative paths in the working dir start with only 2 /.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I got the concept of shared:// wrong. Change undone and updated docs.

Just for the record: The File prefixes such as shared:// are explained in the File docs.

path = path.replace('worker://', '')
path = path.replace('file://', '')
# the specific project folder://
Expand Down
1 change: 1 addition & 0 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ Examples Notebooks
examples/example4
examples/example5
examples/example6
examples/example7
6 changes: 6 additions & 0 deletions docs/examples/example7.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.. _example7:

Example 7 - Miscellaneous
=========================

.. notebook:: examples/tutorial/7_example_misc.ipynb
270 changes: 270 additions & 0 deletions examples/tutorial/7_example_misc.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## Importing existing trajectory data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In many cases, some trajectory data already exists before running an adaptive simulation. It is thus most efficiently to import this data into the framework. This works in principle by creating `Trajectory` objects and adding them to the `Project`. Since all of the trajectory-related data however is stored in the `Engine` object that generated it, this needs to be created as well."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"### Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"import sys, os"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"from adaptivemd import Project"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Let's open our `test` project by its name. If you completed the previous example this should all work out of the box."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"project = Project('tutorial')"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Open all connections to the `MongoDB` and `Session` so we can get started."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an import `Engine`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from adaptivemd import Trajectory\n",
"import time"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"pdb_file = File('file://init.pdb').named('initial_pdb').load()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since it is not desired to expand the trajectories at this point, system and integrator files are not given. In principle, if compatible restart files are available, one could create a complete engine and expand existing trajectories."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import_engine = OpenMMEngine(pdb_file=pdb_file,\n",
" system_file=None,\n",
" integrator_file=None,\n",
" args=None\n",
" ).named('openmm-import')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, to use the same `Modeller` as for the trajectories generated with `AdaptiveMD`, we build compatible output types. This means, they should contain the original file names with the respective strides and be named accordly. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import_engine.add_output_type('master', 'old-file-name-full.dcd', \n",
" stride=stride_full)\n",
"import_engine.add_output_type('protein', 'old-file-name-protein.dcd', \n",
" stride=stride_prot, \n",
" selection='protein')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize `Trajectory` objects\n",
"To add the actual file paths, `Trajectory` objects have to be initialized. \n",
"- `Trajectory` locations are folders, not files, and end with '/'.\n",
"- `frame` can be None if the initial frame is not known.\n",
"- `length` as defined by the engine time step, not by the output/save rate of an output type.\n",
"- `engine`: import engine defined above.\n",
"\n",
"The example below uses a list of trajectory folders to import, `existing_trajectory_paths`. The trajectory lengths are known and stored in `existing_trajectory_lengths`.\n",
"\n",
"The `created` variable has to be set a creation time in order to let the database know the trajectory already exists. In the example below, the (arbitrary) import time is used."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"trajs = []\n",
"for traj_path, traj_length in zip(existing_trajectory_paths, \n",
" existing_trajectory_lengths):\n",
" traj = Trajectory('shared://' + traj_path,\n",
" frame=None,\n",
" length=traj_length,\n",
" engine=import_engine)\n",
" traj.created = time.time()\n",
" trajs.append(traj)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Add the trajectories to the project"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"map(project.files.add, trajs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check if the trajectories have been added:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"len(project.trajectories)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"project.close()"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "py27_mar17",
"language": "python",
"name": "py27_mar17"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
}
},
"nbformat": 4,
"nbformat_minor": 1
}