Skip to content

Commit

Permalink
Add details about CUDA extensions
Browse files Browse the repository at this point in the history
  • Loading branch information
tridao committed Oct 7, 2022
1 parent f6f82e9 commit 747f905
Showing 1 changed file with 12 additions and 3 deletions.
15 changes: 12 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
We use the template from `https://github.com/ashleve/lightning-hydra-template`.
Please read the instructions there to understand the repo structure.

## GPT2 training
To train GPT2 on Openwebtext with 8 GPUs:
```sh
python run.py experiment=owt/gpt2s-flash trainer.devices=8
python run.py experiment=owt/gpt2m-flash trainer.devices=8
python run.py experiment=owt/gpt2l-flash trainer.devices=8
```
To train with bf16 instead of fp16, add `trainer.precision=bf16`.

## Requirements

Expand All @@ -15,10 +17,17 @@ We recommend CUDA 11.8 (e.g., using the Nvidia's Pytorch Docker image from https

We provide a Dockerfile that lists all the required packages.

To install the CUDA extensions:
This repo includes the following CUDA extensions:
1. Fused dropout + residual + LayerNorm, adapted from Apex's [FastLayerNorm](https://github.com/NVIDIA/apex/tree/master/apex/contrib/layer_norm).
```sh
cd csrc/xentropy && pip install .
cd csrc/layer_norm && pip install .
```
2. Fused matmul + bias (forward and backward), and fused matmul + bias + gelu
(forward and backward), adapted from Apex's [FusedDense](https://github.com/NVIDIA/apex/tree/master/apex/fused_dense).
```sh
cd csrc/fused_dense_lib && pip install .
cd csrc/cauchy && pip install .
```
3. Optimized cross-entropy loss, adapted from Apex's [Xentropy](https://github.com/NVIDIA/apex/tree/master/apex/contrib/xentropy).
```sh
cd csrc/xentropy && pip install .
```

0 comments on commit 747f905

Please sign in to comment.