From 747f905c432d9183a0f1b9951e7e94c0d4079f8b Mon Sep 17 00:00:00 2001 From: Tri Dao Date: Fri, 7 Oct 2022 13:07:10 -0700 Subject: [PATCH] Add details about CUDA extensions --- README.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index c13c184..f7167b0 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,14 @@ We use the template from `https://github.com/ashleve/lightning-hydra-template`. Please read the instructions there to understand the repo structure. +## GPT2 training To train GPT2 on Openwebtext with 8 GPUs: ```sh python run.py experiment=owt/gpt2s-flash trainer.devices=8 python run.py experiment=owt/gpt2m-flash trainer.devices=8 python run.py experiment=owt/gpt2l-flash trainer.devices=8 ``` +To train with bf16 instead of fp16, add `trainer.precision=bf16`. ## Requirements @@ -15,10 +17,17 @@ We recommend CUDA 11.8 (e.g., using the Nvidia's Pytorch Docker image from https We provide a Dockerfile that lists all the required packages. -To install the CUDA extensions: +This repo includes the following CUDA extensions: +1. Fused dropout + residual + LayerNorm, adapted from Apex's [FastLayerNorm](https://github.com/NVIDIA/apex/tree/master/apex/contrib/layer_norm). ```sh -cd csrc/xentropy && pip install . cd csrc/layer_norm && pip install . +``` +2. Fused matmul + bias (forward and backward), and fused matmul + bias + gelu +(forward and backward), adapted from Apex's [FusedDense](https://github.com/NVIDIA/apex/tree/master/apex/fused_dense). +```sh cd csrc/fused_dense_lib && pip install . -cd csrc/cauchy && pip install . +``` +3. Optimized cross-entropy loss, adapted from Apex's [Xentropy](https://github.com/NVIDIA/apex/tree/master/apex/contrib/xentropy). +```sh +cd csrc/xentropy && pip install . ```