Skip to content

Commit

Permalink
remove duplicated content in parallel-training.md
Browse files Browse the repository at this point in the history
  • Loading branch information
HydrogenSulfate committed Dec 26, 2024
1 parent 8c794e5 commit 7c77e7f
Showing 1 changed file with 0 additions and 56 deletions.
56 changes: 0 additions & 56 deletions doc/train/parallel-training.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,62 +190,6 @@ torchrun --rdzv_endpoint=node0:12321 --nnodes=2 --nproc_per_node=4 --node_rank=1
## Paddle Implementation {{ paddle_icon }}

Currently, parallel training in paddle version is implemented in the form of Paddle Distributed Data Parallelism [DDP](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/06_distributed_training/cluster_quick_start_collective_cn.html).
DeePMD-kit will decide whether to launch the training in parallel (distributed) mode or in serial mode depending on your execution command.

### Dataloader and Dataset

First, we establish a DeepmdData class for each system, which is consistent with the TensorFlow version in this level. Then, we create a dataloader for each system, resulting in the same number of dataloaders as the number of systems. Next, we create a dataset for the dataloaders obtained in the previous step. This allows us to query the data for each system through this dataset, while the iteration pointers for each system are maintained by their respective dataloaders. Finally, a dataloader is created for the outermost dataset.

We achieve custom sampling methods using a weighted sampler. The length of the sampler is set to total_batch_num \* num_workers.The parameter "num_workers" defines the number of threads involved in multi-threaded loading, which can be modified by setting the environment variable NUM_WORKERS (default: min(8, ncpus)).

> **Note** The underlying dataloader will use a distributed sampler to ensure that each GPU receives batches with different content in parallel mode, which will use sequential sampler in serial mode. In the TensorFlow version, Horovod shuffles the dataset using different random seeds for the same purpose..
```mermaid
flowchart LR
subgraph systems
subgraph system1
direction LR
frame1[frame 1]
frame2[frame 2]
end
subgraph system2
direction LR
frame3[frame 3]
frame4[frame 4]
frame5[frame 5]
end
end
subgraph dataset
dataset1[dataset 1]
dataset2[dataset 2]
end
system1 -- frames --> dataset1
system2 --> dataset2
subgraph distribted sampler
ds1[distributed sampler 1]
ds2[distributed sampler 2]
end
dataset1 --> ds1
dataset2 --> ds2
subgraph dataloader
dataloader1[dataloader 1]
dataloader2[dataloader 2]
end
ds1 -- mini batch --> dataloader1
ds2 --> dataloader2
subgraph index[index on Rank 0]
dl11[dataloader 1, entry 1]
dl21[dataloader 2, entry 1]
dl22[dataloader 2, entry 2]
end
dataloader1 --> dl11
dataloader2 --> dl21
dataloader2 --> dl22
index -- for each step, choose 1 system --> WeightedSampler
--> dploaderset --> bufferedq[buffered queue] --> model
```

### How to use

We use [`paddle.distributed.fleet`](https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/06_distributed_training/cluster_quick_start_collective_cn.html) to launch a DDP training session.
Expand Down

0 comments on commit 7c77e7f

Please sign in to comment.