GitHub - humeniuka/sGDML_dataset_generation: Parallelize generation of training data for sGDML on a SLURM queue

Description

In order to train machine-learned potentials such as sGDML, one has to generate large amounts of training data. Given a sequence of geometries, this set of scripts distributes QChem jobs over a SLURM queue and stores the forces and NAC vectors in the format expected by sGDML.

Requirements

QChem

SLURM queue

Installation

$ pip install -e .

Getting Started

In 'examples/formaldehyde', run

$ sbatch --job-name='create dataset' << EOF
  #!/bin/bash
     forces_trajectory.py   geometries.xyz  qchem.in  --parallel_images=10
  EOF

After some time, this should produce the following files in extended XYZ format (using atomic units):

forces_0.xyz - forces in S0

forces_1.xyz - forces in S1

nacvec_0-1.xyz - NAC vectors between S0 and S1

Geometries at which the forces are calculated, can be sampled from the Maxwell-Boltzmann distribution by running molecular dynamics at high temperature with the ANI-2x force field.

$ ani2x_dynamics.py  initial.xyz --temperature=500.0  -o geometries.xyz

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
examples/formaldehyde		examples/formaldehyde
scripts		scripts
sgdml_dataset_generation		sgdml_dataset_generation
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.rst		README.rst
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Requirements

Installation

Getting Started

About

Releases

Packages

Languages

License

humeniuka/sGDML_dataset_generation

Folders and files

Latest commit

History

Repository files navigation

Description

Requirements

Installation

Getting Started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages