From 563a7bdb18b5d766bbbf904208db4a2e74f88305 Mon Sep 17 00:00:00 2001
From: HydrogenSulfate <490868991@qq.com>
Date: Sat, 21 Dec 2024 20:21:29 +0800
Subject: [PATCH 1/8] update paddle icon and document related to paddle
---
README.md | 2 +-
doc/_static/paddle.svg | 10 ++++
doc/backend.md | 13 +++-
doc/conf.py | 1 +
doc/env.md | 15 +++++
doc/freeze/freeze.md | 23 +++++++
doc/install/easy-install.md | 34 +++++++++++
doc/install/install-from-source.md | 77 +++++++++++++++++++++++-
doc/model/dpa2.md | 4 +-
doc/model/sel.md | 8 +++
doc/model/train-energy.md | 4 +-
doc/model/train-se-atten.md | 4 +-
doc/model/train-se-e2-a.md | 4 +-
doc/train/finetuning.md | 70 +++++++++++++++++++++-
doc/train/parallel-training.md | 96 +++++++++++++++++++++++++++++-
doc/train/tensorboard.md | 4 +-
doc/train/training.md | 8 +++
17 files changed, 358 insertions(+), 19 deletions(-)
create mode 100644 doc/_static/paddle.svg
diff --git a/README.md b/README.md
index 18bdfd6560..e374039144 100644
--- a/README.md
+++ b/README.md
@@ -19,7 +19,7 @@ For more information, check the [documentation](https://deepmd.readthedocs.io/).
### Highlighted features
-- **interfaced with multiple backends**, including TensorFlow, PyTorch, and JAX, the most popular deep learning frameworks, making the training process highly automatic and efficient.
+- **interfaced with multiple backends**, including TensorFlow, PyTorch, JAX and Paddle, the most popular deep learning frameworks, making the training process highly automatic and efficient.
- **interfaced with high-performance classical MD and quantum (path-integral) MD packages**, including LAMMPS, i-PI, AMBER, CP2K, GROMACS, OpenMM, and ABACUS.
- **implements the Deep Potential series models**, which have been successfully applied to finite and extended systems, including organic molecules, metals, semiconductors, insulators, etc.
- **implements MPI and GPU supports**, making it highly efficient for high-performance parallel and distributed computing.
diff --git a/doc/_static/paddle.svg b/doc/_static/paddle.svg
new file mode 100644
index 0000000000..5fdd09df04
--- /dev/null
+++ b/doc/_static/paddle.svg
@@ -0,0 +1,10 @@
+
+
diff --git a/doc/backend.md b/doc/backend.md
index 2be7ab7460..8062b623ca 100644
--- a/doc/backend.md
+++ b/doc/backend.md
@@ -5,7 +5,7 @@
DeePMD-kit supports multiple backends: TensorFlow and PyTorch.
To use DeePMD-kit, you must install at least one backend.
Each backend does not support all features.
-In the documentation, TensorFlow {{ tensorflow_icon }} and PyTorch {{ pytorch_icon }} icons are used to mark whether a backend supports a feature.
+In the documentation, TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }} and Paddle {{ paddle_icon }} icons are used to mark whether a backend supports a feature.
### TensorFlow {{ tensorflow_icon }}
@@ -35,6 +35,15 @@ Only the `.savedmodel` format supports C++ inference, which needs the TensorFlow
The model is device-specific, so that the model generated on the GPU device cannot be run on the CPUs.
Currently, this backend is developed actively, and has no support for training.
+### Paddle {{ paddle_icon }}
+
+- Model filename extensions: `.json` and `.pdiparams`
+- Checkpoint filename extension: `.pd`
+
+[Paddle](https://www.paddlepaddle.org.cn/) version 3.0 or above is required.
+
+The `.pd` extension is used for model checkpoint storage, which is commonly utilized during training and testing in Python. The `.json` extension is for the model's computational graph in [PIR representation](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/paddle_v3_features/paddle_ir_cn.html), while the `.pdiparams` extension stores model parameters. Both `.json` and `.pdiparams` files are exported together and used in model freezing and C++ inference.
+
### DP {{ dpmodel_icon }}
:::{note}
@@ -57,7 +66,7 @@ NumPy 1.21 or above is required.
### Training
-When training and freezing a model, you can use `dp --tf` or `dp --pt` in the command line to switch the backend.
+When training and freezing a model, you can use `dp --tf`, `dp --pt` or `dp --pd` in the command line to switch the backend.
### Inference
diff --git a/doc/conf.py b/doc/conf.py
index b266126c58..52c647a20d 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -167,6 +167,7 @@
"tensorflow_icon": """![TensorFlow](/_static/tensorflow.svg){class=platform-icon}""",
"pytorch_icon": """![PyTorch](/_static/pytorch.svg){class=platform-icon}""",
"jax_icon": """![JAX](/_static/jax.svg){class=platform-icon}""",
+ "paddle_icon": """![Paddle](/_static/paddle.svg){class=platform-icon}""",
"dpmodel_icon": """![DP](/_static/logo_icon.svg){class=platform-icon}""",
}
diff --git a/doc/env.md b/doc/env.md
index 3cf42b724a..5ff4cc695c 100644
--- a/doc/env.md
+++ b/doc/env.md
@@ -56,6 +56,21 @@ Control high (double) or low (float) precision of training.
{{ tensorflow_icon }} Enable JIT. Note that this option may either improve or decrease the performance. Requires TensorFlow to support JIT.
:::
+:::{envvar} PD_JIT
+
+**Choices**: `0`, `1`; **Default**: `0`
+
+{{ paddle_icon }} Enable Paddle JIT. Note that this option may either improve or decrease the performance.
+:::
+
+:::{envvar} PD_CINN
+
+**Choices**: `0`, `1`; **Default**: `0`
+
+{{ paddle_icon }} Enable Paddle CINN Compiler when `PD_JIT` is enabled. Note that this option may either improve or decrease the performance. Requires Paddle to support CINN()(`paddle.device.is_compiled_with_cinn()` is `True`).
+
+:::
+
:::{envvar} DP_INFER_BATCH_SIZE
**Default**: `1024` on CPUs and as maximum as possible until out-of-memory on GPUs
diff --git a/doc/freeze/freeze.md b/doc/freeze/freeze.md
index f394b64283..20f02177c6 100644
--- a/doc/freeze/freeze.md
+++ b/doc/freeze/freeze.md
@@ -32,3 +32,26 @@ $ dp --pt freeze -o model_branch1.pth --head CHOSEN_BRANCH
```
The output model is called `model_branch1.pth`, which is the specifically frozen model with the `CHOSEN_BRANCH` head.
+
+:::
+
+:::{tab-item} Paddle {{ paddle_icon }}
+
+```bash
+$ dp --pd freeze -o model
+```
+
+in the folder where the model is trained. The output model is called `model.json` and `model.pdiparams`.
+
+In [multi-task mode](../train/multi-task-training-pd.md), you need to choose one available heads (e.g. `CHOSEN_BRANCH`) by `--head`
+to specify which model branch you want to freeze:
+
+```bash
+$ dp --pd freeze -o model_branch1 --head CHOSEN_BRANCH
+```
+
+The output model is called `model_branch1.json`, which is the specifically frozen model with the `CHOSEN_BRANCH` head.
+
+:::
+
+::::
diff --git a/doc/install/easy-install.md b/doc/install/easy-install.md
index b892463caf..0bf8f98967 100644
--- a/doc/install/easy-install.md
+++ b/doc/install/easy-install.md
@@ -186,6 +186,40 @@ Switch to the TensorFlow {{ tensorflow_icon }} tab for more information.
::::::
+::::::{tab-item} Paddle {{ paddle_icon }}
+
+:::::{tab-set}
+
+::::{tab-item} CUDA 12.3
+
+```bash
+pip install deepmd-kit[paddle]
+```
+
+::::
+
+::::{tab-item} CUDA 11.8
+
+```bash
+pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu123/
+pip install deepmd-kit-cu11
+```
+
+::::
+
+::::{tab-item} CPU
+
+```bash
+pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
+pip install deepmd-kit
+```
+
+::::
+
+:::::
+
+::::::
+
:::::::
The supported platform includes Linux x86-64 and aarch64 with GNU C Library 2.28 or above, macOS x86-64 and arm64, and Windows x86-64.
diff --git a/doc/install/install-from-source.md b/doc/install/install-from-source.md
index 63060f692a..5643f6270a 100644
--- a/doc/install/install-from-source.md
+++ b/doc/install/install-from-source.md
@@ -93,6 +93,21 @@ One can also [use conda](https://docs.deepmodeling.org/faq/conda.html) to instal
:::
+:::{tab-item} Paddle {{ paddle_icon }}
+
+To install Paddle, run
+
+```sh
+# cu123
+pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/
+# cu118
+pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu118/
+# cpu
+pip install --pre paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/
+```
+
+:::
+
::::
It is important that every time a new shell is started and one wants to use `DeePMD-kit`, the virtual environment should be activated by
@@ -119,7 +134,7 @@ One should remember to activate the virtual environment every time he/she uses D
Check the compiler version on your machine
-```
+```bash
gcc --version
```
@@ -141,6 +156,12 @@ Note that PyTorch may have specific requirements for the compiler version to sup
:::
+:::{tab-item} Paddle {{ paddle_icon }}
+
+You can set the environment variable `export DP_ENABLE_PADDLE=1` to enable customized C++ OPs in the Paddle backend.
+
+:::
+
::::
Execute
@@ -188,6 +209,13 @@ The path to the ROCM toolkit directory. If `ROCM_ROOT` is not set, it will look
{{ pytorch_icon }} Enable customized C++ OPs for the PyTorch backend. PyTorch can still run without customized C++ OPs, but features will be limited.
:::
+:::{envvar} DP_ENABLE_PADDLE
+
+**Choices**: `0`, `1`; **Default**: `0`
+
+{{ paddle_icon }} Enable customized C++ OPs for the Paddle backend. Paddle can still run without customized C++ OPs, but features will be limited.
+:::
+
:::{envvar} TENSORFLOW_ROOT
**Type**: Path; **Default**: Detected automatically
@@ -202,6 +230,13 @@ The path to the ROCM toolkit directory. If `ROCM_ROOT` is not set, it will look
{{ pytorch_icon }} The path to PyTorch Python library. If not given, by default, the installer only finds PyTorch under the user site-package directory (`site.getusersitepackages()`) or the system site-package directory (`sysconfig.get_path("purelib")`) due to the limitation of [PEP-517](https://peps.python.org/pep-0517/). If not found, the latest PyTorch (or the environment variable `PYTORCH_VERSION` if given) from PyPI will be built against.
:::
+:::{envvar} PADDLE_INFERENCE_DIR
+
+**Type**: Path; **Default**: None
+
+{{ paddle_icon }} The path to Paddle inference library, e.g. `/path/to/paddle_inference_install_dir`. If `DP_ENABLE_PADDLE` is enabled, it needs to be specified manually; otherwise, installation will fail.
+:::
+
:::{envvar} DP_ENABLE_NATIVE_OPTIMIZATION
**Choices**: `0`, `1`; **Default**: `0`
@@ -229,7 +264,7 @@ Other [CMake environment variables](https://cmake.org/cmake/help/latest/manual/c
To test the installation, one should first jump out of the source directory
-```
+```bash
cd /some/other/workspace
```
@@ -325,6 +360,18 @@ download the TensorFlow C library from [this page](https://www.tensorflow.org/in
:::
+:::{tab-item} Paddle {{ paddle_icon }}
+
+If you want to use C++ interface of Paddle, you need to compile the Paddle inference library(C++ interface) manually from the [linux-compile-by-make](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/install/compile/linux-compile-by-make.html), then use the `.so` and `.a` files in `Paddle/build/paddle_inference_install_dir/`.
+
+We also provide a nightly pre-compiled Paddle C++ library for Linux x86_64 with CUDA 11.8/12.3 below:
+
+[Cuda118_cudnn860_Trt8531_D1/latest/paddle_inference.tgz](https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda118_cudnn860_Trt8531_D1/latest/paddle_inference.tgz)
+
+[Cuda123_cudnn900_Trt8616_D1/latest/paddle_inference.tgz](https://paddle-qa.bj.bcebos.com/paddle-pipeline/GITHUB_Docker_Compile_Test_Cuda123_cudnn900_Trt8616_D1/latest/paddle_inference.tgz)
+
+:::
+
::::
### Install DeePMD-kit's C++ interface
@@ -389,6 +436,16 @@ cmake -DENABLE_JAX=ON -D CMAKE_PREFIX_PATH=${tensorflow_c_root} ..
:::
+:::{tab-item} Paddle {{ paddle_icon }}
+
+I assume you have get the Paddle inference library(C++ interface) to `$PADDLE_INFERENCE_DIR`, then execute CMake
+
+```bash
+cmake -DENABLE_PADDLE=ON -DPADDLE_INFERENCE_DIR=$PADDLE_INFERENCE_DIR -DCMAKE_INSTALL_PREFIX=$deepmd_root ..
+```
+
+:::
+
::::
One may add the following CMake variables to `cmake` using the [`-D =` option](https://cmake.org/cmake/help/latest/manual/cmake.1.html#cmdoption-cmake-D):
@@ -420,6 +477,14 @@ If {cmake:variable}`ENABLE_TENSORFLOW` is `OFF`, the TensorFlow C library is use
:::
+:::{cmake:variable} ENABLE_PADDLE
+
+**Type**: `BOOL` (`ON`/`OFF`), Default: `OFF`
+
+{{ paddle_icon }} Whether building the Paddle backend.
+
+:::
+
:::{cmake:variable} TENSORFLOW_ROOT
**Type**: `PATH`
@@ -428,6 +493,14 @@ If {cmake:variable}`ENABLE_TENSORFLOW` is `OFF`, the TensorFlow C library is use
:::
+:::{cmake:variable} PADDLE_INFERENCE_DIR
+
+**Type**: `PATH`
+
+{{ paddle_icon }} The Path to Paddle's C++ inference directory, such as `/path/to/paddle_inference_install_dir` or `/path/to/paddle_inference`.
+
+:::
+
:::{cmake:variable} CMAKE_INSTALL_PREFIX
**Type**: `PATH`
diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md
index eb641d6b01..70c9fee9d5 100644
--- a/doc/model/dpa2.md
+++ b/doc/model/dpa2.md
@@ -1,7 +1,7 @@
-# Descriptor DPA-2 {{ pytorch_icon }} {{ jax_icon }} {{ dpmodel_icon }}
+# Descriptor DPA-2 {{ pytorch_icon }} {{ jax_icon }} {{ paddle_icon }} {{ dpmodel_icon }}
:::{note}
-**Supported backends**: PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, DP {{ dpmodel_icon }}
+**Supported backends**: PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, Paddle {{ paddle_icon }}, DP {{ dpmodel_icon }}
:::
The DPA-2 model implementation. See https://arxiv.org/abs/2312.15492 for more details.
diff --git a/doc/model/sel.md b/doc/model/sel.md
index babea1d463..5b85318dd9 100644
--- a/doc/model/sel.md
+++ b/doc/model/sel.md
@@ -32,6 +32,14 @@ dp --jax neighbor-stat -s data -r 6.0 -t O H
:::
+:::{tab-item} Paddle {{ paddle_icon }}
+
+```sh
+dp --pd neighbor-stat -s data -r 6.0 -t O H
+```
+
+:::
+
::::
where `data` is the directory of data, `6.0` is the cutoff radius, and `O` and `H` is the type map. The program will give the `max_nbor_size`. For example, `max_nbor_size` of the water example is `[38, 72]`, meaning an atom may have 38 O neighbors and 72 H neighbors in the training data.
diff --git a/doc/model/train-energy.md b/doc/model/train-energy.md
index 484564b14f..128779ee16 100644
--- a/doc/model/train-energy.md
+++ b/doc/model/train-energy.md
@@ -1,7 +1,7 @@
-# Fit energy {{ tensorflow_icon }} {{ pytorch_icon }} {{ jax_icon }} {{ dpmodel_icon }}
+# Fit energy {{ tensorflow_icon }} {{ pytorch_icon }} {{ jax_icon }} {{ paddle_icon }} {{ dpmodel_icon }}
:::{note}
-**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, DP {{ dpmodel_icon }}
+**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, Paddle {{ paddle_icon }}, DP {{ dpmodel_icon }}
:::
In this section, we will take `$deepmd_source_dir/examples/water/se_e2_a/input.json` as an example of the input file.
diff --git a/doc/model/train-se-atten.md b/doc/model/train-se-atten.md
index 92a56395f6..5b9e4d7e4a 100644
--- a/doc/model/train-se-atten.md
+++ b/doc/model/train-se-atten.md
@@ -1,7 +1,7 @@
-# Descriptor `"se_atten"` {{ tensorflow_icon }} {{ pytorch_icon }} {{ jax_icon }} {{ dpmodel_icon }}
+# Descriptor `"se_atten"` {{ tensorflow_icon }} {{ pytorch_icon }} {{ jax_icon }} {{ paddle_icon }} {{ dpmodel_icon }}
:::{note}
-**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, DP {{ dpmodel_icon }}
+**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, Paddle {{ paddle_icon }}, DP {{ dpmodel_icon }}
:::
![ALT](../images/model_se_atten.png "model_se_atten")
diff --git a/doc/model/train-se-e2-a.md b/doc/model/train-se-e2-a.md
index 5143d5b5fb..9382c78d8e 100644
--- a/doc/model/train-se-e2-a.md
+++ b/doc/model/train-se-e2-a.md
@@ -1,7 +1,7 @@
-# Descriptor `"se_e2_a"` {{ tensorflow_icon }} {{ pytorch_icon }} {{ jax_icon }} {{ dpmodel_icon }}
+# Descriptor `"se_e2_a"` {{ tensorflow_icon }} {{ pytorch_icon }} {{ jax_icon }} {{ paddle_icon }} {{ dpmodel_icon }}
:::{note}
-**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, DP {{ dpmodel_icon }}
+**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, Paddle {{ paddle_icon }}, DP {{ dpmodel_icon }}
:::
The notation of `se_e2_a` is short for the Deep Potential Smooth Edition (DeepPot-SE) constructed from all information (both angular and radial) of atomic configurations. The `e2` stands for the embedding with two-atoms information. This descriptor was described in detail in [the DeepPot-SE paper](https://arxiv.org/abs/1805.09003).
diff --git a/doc/train/finetuning.md b/doc/train/finetuning.md
index cf2f5fde4f..d7175121d6 100644
--- a/doc/train/finetuning.md
+++ b/doc/train/finetuning.md
@@ -1,7 +1,7 @@
-# Finetune the pre-trained model {{ tensorflow_icon }} {{ pytorch_icon }}
+# Finetune the pre-trained model {{ tensorflow_icon }} {{ pytorch_icon }} {{ paddle_icon }}
:::{note}
-**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}
+**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, Paddle {{ paddle_icon }}
:::
Pretraining-and-finetuning is a widely used approach in other fields such as Computer Vision (CV) or Natural Language Processing (NLP)
@@ -196,3 +196,69 @@ This will initiate multitask fine-tuning, where for branches `PRE_DATA1` and `PR
it is akin to continuing training in `init-model` mode, whereas for `DOWNSTREAM_DATA`,
fine-tuning will be based on the fitting net from `PRE_DATA1`.
You can set `model_prob` for each dataset just the same as that in normal multitask training.
+
+## Paddle Implementation {{ paddle_icon }}
+
+In Paddle version, we have introduced an updated, more adaptable approach to fine-tuning. This methodology encompasses two primary variations:
+
+### Single-task fine-tuning
+
+#### Fine-tuning from a single-task pre-trained model
+
+By saying "single-task pre-trained", we refer to a model pre-trained on one single dataset.
+This fine-tuning method is similar to the fine-tune approach supported by TensorFlow.
+It utilizes a single-task pre-trained model (`pretrained.pd`) and modifies the energy bias within its fitting net before continuing with training.
+The command for this operation is:
+
+```bash
+$ dp --pd train input.json --finetune pretrained.pd
+```
+
+In this case, it is important to note that the fitting net weights, except the energy bias, will be automatically set to those in the pre-trained model. This default setting is consistent with the implementations in TensorFlow.
+If you wish to conduct fine-tuning using a randomly initialized fitting net in this scenario, you can manually adjust the `--model-branch` parameter to "RANDOM":
+
+```bash
+$ dp --pd train input.json --finetune pretrained.pd --model-branch RANDOM
+```
+
+The model section in input.json **must be the same as that in the pretrained model**.
+If you do not know the model params in the pretrained model, you can add `--use-pretrain-script` in the fine-tuning command:
+
+```bash
+$ dp --pd train input.json --finetune pretrained.pd --use-pretrain-script
+```
+
+The model section will be overwritten (except the `type_map` subsection) by that in the pretrained model and then the input.json can be simplified as follows:
+
+```json
+ "model": {
+ "type_map": ["O", "H"],
+ "descriptor" : {},
+ "fitting_net" : {}
+ }
+```
+
+#### Fine-tuning from a multi-task pre-trained model
+
+Additionally, within the Paddle implementation and leveraging the flexibility offered by the framework and the multi-task training process proposed in DPA2 [paper](https://arxiv.org/abs/2312.15492),
+we also support more general multitask pre-trained models, which includes multiple datasets for pre-training. These pre-training datasets share a common descriptor while maintaining their individual fitting nets,
+as detailed in the paper above.
+
+For fine-tuning using this multitask pre-trained model (`multitask_pretrained.pd`),
+one can select a specific branch (e.g., `CHOOSEN_BRANCH`) included in `multitask_pretrained.pd` for fine-tuning with the following command:
+
+```bash
+$ dp --pd train input.json --finetune multitask_pretrained.pd --model-branch CHOOSEN_BRANCH
+```
+
+:::{note}
+One can check the available model branches in multi-task pre-trained model by refering to the documentation of the pre-trained model or by using the following command:
+
+```bash
+$ dp --pd show multitask_pretrained.pd model-branch
+```
+
+:::
+
+This command will start fine-tuning based on the pre-trained model's descriptor and the selected branch's fitting net.
+If --model-branch is not set or set to "RANDOM", a randomly initialized fitting net will be used.
diff --git a/doc/train/parallel-training.md b/doc/train/parallel-training.md
index 9ea92b4751..4a94ee5e26 100644
--- a/doc/train/parallel-training.md
+++ b/doc/train/parallel-training.md
@@ -1,7 +1,7 @@
-# Parallel training {{ tensorflow_icon }} {{ pytorch_icon }}
+# Parallel training {{ tensorflow_icon }} {{ pytorch_icon }} {{ paddle_icon }}
:::{note}
-**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}
+**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, Paddle {{ paddle_icon }}
:::
## TensorFlow Implementation {{ tensorflow_icon }}
@@ -187,3 +187,95 @@ torchrun --rdzv_endpoint=node0:12321 --nnodes=2 --nproc_per_node=4 --node_rank=1
> **Note** for developers: `torchrun` by default passes settings as environment variables [(list here)](https://pytorch.org/docs/stable/elastic/run.html#environment-variables).
> To check forward, backward, and communication time, please set env var `TORCH_CPP_LOG_LEVEL=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL`. More details can be found [here](https://pytorch.org/docs/stable/distributed.html#logging).
+
+## Paddle Implementation {{ paddle_icon }}
+
+Currently, parallel training in paddle version is implemented in the form of Paddle Distributed Data Parallelism [DDP](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/06_distributed_training/cluster_quick_start_collective_cn.html).
+DeePMD-kit will decide whether to launch the training in parallel (distributed) mode or in serial mode depending on your execution command.
+
+### Dataloader and Dataset
+
+First, we establish a DeepmdData class for each system, which is consistent with the TensorFlow version in this level. Then, we create a dataloader for each system, resulting in the same number of dataloaders as the number of systems. Next, we create a dataset for the dataloaders obtained in the previous step. This allows us to query the data for each system through this dataset, while the iteration pointers for each system are maintained by their respective dataloaders. Finally, a dataloader is created for the outermost dataset.
+
+We achieve custom sampling methods using a weighted sampler. The length of the sampler is set to total_batch_num \* num_workers.The parameter "num_workers" defines the number of threads involved in multi-threaded loading, which can be modified by setting the environment variable NUM_WORKERS (default: min(8, ncpus)).
+
+> **Note** The underlying dataloader will use a distributed sampler to ensure that each GPU receives batches with different content in parallel mode, which will use sequential sampler in serial mode. In the TensorFlow version, Horovod shuffles the dataset using different random seeds for the same purpose..
+
+```mermaid
+flowchart LR
+ subgraph systems
+ subgraph system1
+ direction LR
+ frame1[frame 1]
+ frame2[frame 2]
+ end
+ subgraph system2
+ direction LR
+ frame3[frame 3]
+ frame4[frame 4]
+ frame5[frame 5]
+ end
+ end
+ subgraph dataset
+ dataset1[dataset 1]
+ dataset2[dataset 2]
+ end
+ system1 -- frames --> dataset1
+ system2 --> dataset2
+ subgraph distribted sampler
+ ds1[distributed sampler 1]
+ ds2[distributed sampler 2]
+ end
+ dataset1 --> ds1
+ dataset2 --> ds2
+ subgraph dataloader
+ dataloader1[dataloader 1]
+ dataloader2[dataloader 2]
+ end
+ ds1 -- mini batch --> dataloader1
+ ds2 --> dataloader2
+ subgraph index[index on Rank 0]
+ dl11[dataloader 1, entry 1]
+ dl21[dataloader 2, entry 1]
+ dl22[dataloader 2, entry 2]
+ end
+ dataloader1 --> dl11
+ dataloader2 --> dl21
+ dataloader2 --> dl22
+ index -- for each step, choose 1 system --> WeightedSampler
+ --> dploaderset --> bufferedq[buffered queue] --> model
+```
+
+### How to use
+
+We use [`paddle.distributed.fleet`](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/06_distributed_training/cluster_quick_start_collective_cn.html) to launch a DDP training session.
+
+To start training with multiple GPUs in one node, set environment variable `CUDA_VISIBLE_DEVICES` as the list of GPUs you want to use:
+
+```bash
+# example for training with 4 gpus in one node
+NUM_WORKERS=0 HDF5_USE_FILE_LOCKING=0 CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --gpus="0,1,2,3" dp --pd train input.json
+```
+
+Suppose you have 2 nodes each with 4 GPUs and their ip address are: `192.168.1.2` and `192.168.1.3`, then you can use `paddle.distributed.launch` to launch a DDP training session:
+
+```bash
+# run in node 192.168.1.2
+NUM_WORKERS=0 HDF5_USE_FILE_LOCKING=0 python -m paddle.distributed.launch \
+ --gpus=0,1,2,3 \
+ --ips=192.168.1.2,192.168.1.3 \
+ dp --pd train input.json
+
+# then run in the other node 192.168.1.3
+NUM_WORKERS=0 HDF5_USE_FILE_LOCKING=0 python -m paddle.distributed.launch \
+ --gpus=0,1,2,3 \
+ --ips=192.168.1.2,192.168.1.3 \
+ dp --pd train input.json
+```
+
+:::{note}
+
+If `NUM_WORKERS` is too large, it may cause the program to be terminated by the system;
+if it is too small, it may slow down data reading. You can try adjusting it to an appropriate size.
+
+:::
diff --git a/doc/train/tensorboard.md b/doc/train/tensorboard.md
index 32ecdd0ab2..3925ab3d3d 100644
--- a/doc/train/tensorboard.md
+++ b/doc/train/tensorboard.md
@@ -1,7 +1,7 @@
-# TensorBoard Usage {{ tensorflow_icon }} {{ pytorch_icon }}
+# TensorBoard Usage {{ tensorflow_icon }} {{ pytorch_icon }} {{ paddle_icon }}
:::{note}
-**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}
+**Supported backends**: TensorFlow {{ tensorflow_icon }}, PyTorch {{ pytorch_icon }}, Paddle {{ paddle_icon }}
:::
TensorBoard provides the visualization and tooling needed for machine learning
diff --git a/doc/train/training.md b/doc/train/training.md
index 5e8f8db498..8f491cc7a8 100644
--- a/doc/train/training.md
+++ b/doc/train/training.md
@@ -26,6 +26,14 @@ $ dp --pt train input.json
:::
+:::{tab-item} Paddle {{ paddle_icon }}
+
+```bash
+$ dp --pd train input.json
+```
+
+:::
+
::::
where `input.json` is the name of the input script.
From c7b21f8d62cd430ccda4ffda2f90589d14e7fbf9 Mon Sep 17 00:00:00 2001
From: HydrogenSulfate <490868991@qq.com>
Date: Thu, 26 Dec 2024 13:47:30 +0800
Subject: [PATCH 2/8] Update README.md
Co-authored-by: Jinzhe Zeng
Signed-off-by: HydrogenSulfate <490868991@qq.com>
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index e374039144..2af98f3057 100644
--- a/README.md
+++ b/README.md
@@ -19,7 +19,7 @@ For more information, check the [documentation](https://deepmd.readthedocs.io/).
### Highlighted features
-- **interfaced with multiple backends**, including TensorFlow, PyTorch, JAX and Paddle, the most popular deep learning frameworks, making the training process highly automatic and efficient.
+- **interfaced with multiple backends**, including TensorFlow, PyTorch, JAX, and Paddle, the most popular deep learning frameworks, making the training process highly automatic and efficient.
- **interfaced with high-performance classical MD and quantum (path-integral) MD packages**, including LAMMPS, i-PI, AMBER, CP2K, GROMACS, OpenMM, and ABACUS.
- **implements the Deep Potential series models**, which have been successfully applied to finite and extended systems, including organic molecules, metals, semiconductors, insulators, etc.
- **implements MPI and GPU supports**, making it highly efficient for high-performance parallel and distributed computing.
From ea1bfe18fcc4731b8ea1d9b07dbe9af490232e2c Mon Sep 17 00:00:00 2001
From: HydrogenSulfate <490868991@qq.com>
Date: Thu, 26 Dec 2024 13:47:39 +0800
Subject: [PATCH 3/8] Update doc/_static/paddle.svg
Co-authored-by: Jinzhe Zeng
Signed-off-by: HydrogenSulfate <490868991@qq.com>
---
doc/_static/paddle.svg | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/doc/_static/paddle.svg b/doc/_static/paddle.svg
index 5fdd09df04..1b7be12e3e 100644
--- a/doc/_static/paddle.svg
+++ b/doc/_static/paddle.svg
@@ -1,6 +1,6 @@