From e98c351bfdc95cbc300211fb0074a1aca1d82f34 Mon Sep 17 00:00:00 2001 From: David Hall Date: Wed, 6 Sep 2023 00:01:14 -0700 Subject: [PATCH] set up read the docs for levanter (#301) --- .readthedocs.yaml | 17 ++++++ docs/Getting-Started-TPU-VM.md | 7 +-- docs/Getting-Started-Training.md | 16 +++--- docs/Installation.md | 4 +- docs/Levanter-1.0-Release.md | 2 +- docs/css/material.css | 25 +++++++++ docs/css/mkdocstrings.css | 26 ++++++++++ docs/requirements.txt | 9 ++++ mkdocs.yml | 88 +++++++++++++++++++++----------- 9 files changed, 150 insertions(+), 44 deletions(-) create mode 100644 .readthedocs.yaml create mode 100644 docs/css/material.css create mode 100644 docs/css/mkdocstrings.css create mode 100644 docs/requirements.txt diff --git a/.readthedocs.yaml b/.readthedocs.yaml new file mode 100644 index 000000000..54dd1859f --- /dev/null +++ b/.readthedocs.yaml @@ -0,0 +1,17 @@ +# Read the Docs configuration file for MkDocs projects +# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details + +# Required +version: 2 + +# Set the version of Python and other tools you might need +build: + os: ubuntu-22.04 + tools: + python: "3.11" +mkdocs: + configuration: mkdocs.yml +# Optionally declare the Python requirements required to build your docs +python: + install: + - requirements: docs/requirements.txt diff --git a/docs/Getting-Started-TPU-VM.md b/docs/Getting-Started-TPU-VM.md index 3e28dde13..33b84d5c0 100644 --- a/docs/Getting-Started-TPU-VM.md +++ b/docs/Getting-Started-TPU-VM.md @@ -56,10 +56,11 @@ In addition to creating the instance, it will also mount the `/files/` nfs share venv and a copy of the repo. **Notes**: -- This uploads setup scripts via scp. If the ssh-key that you used for Google Cloud requires passphrase or your ssh key + +* This uploads setup scripts via scp. If the ssh-key that you used for Google Cloud requires passphrase or your ssh key path is not `~/.ssh/google_compute_engine`, you will need to modify the script. -- The command will spam you with a lot of output, sorry. -- If you use a preemptible instance, you probably want to use the "babysitting" script that automatically re-creates +* The command will spam you with a lot of output, sorry. +* If you use a preemptible instance, you probably want to use the "babysitting" script that automatically re-creates the VM. That's explained down below in the "Running Levanter GPT-2" section. diff --git a/docs/Getting-Started-Training.md b/docs/Getting-Started-Training.md index bcf2162d4..b2739e294 100644 --- a/docs/Getting-Started-Training.md +++ b/docs/Getting-Started-Training.md @@ -17,8 +17,8 @@ To launch the training of a GPT2 model, run the following command: python src/levanter/main/train_lm.py --config_path config/gpt2_small.yaml ``` -This will execute the training pipeline pre-defined in the [train_lm.py](../src/levanter/main/train_lm.py) and set model and training configuration -set in [gpt2_small.yaml](../config/gpt2_small.yaml). You can find more template configurations in the [config](../config/) directory. +This will execute the training pipeline pre-defined in the [train_lm.py](https://github.com/stanford-crfm/levanter/tree/main/src/levanter/main/train_lm.py) and set model and training configuration +set in [gpt2_small.yaml](https://github.com/stanford-crfm/levanter/tree/main/config/gpt2_small.yaml). You can find more template configurations in the [config](https://github.com/stanford-crfm/levanter/tree/main/config/) directory. Configuration files are processed using [Pyrallis](https://github.com/dlwh/draccus). Pyrallis is yet-another yaml-to-dataclass library. @@ -45,8 +45,8 @@ This will overwrite the default model and training configurations and set the fo that the hidden dimension must be divisible by the number of heads. - `trainer.num_train_steps`: The number of training steps to run. -You can find a complete list of parameters to change from the `TrainerConfig` in [trainer.py](src/levanter/trainer.py) and `Gpt2Config` in -[gpt2.py](src/levanter/models/gpt2.py). +You can find a complete list of parameters to change from the `TrainerConfig` in [trainer.py](https://github.com/stanford-crfm/levanter/tree/main/src/levanter/trainer.py) and `Gpt2Config` in +[gpt2.py](https://github.com/stanford-crfm/levanter/tree/main/src/levanter/models/gpt2.py). ### Change Checkpoint Settings To change the frequency of saving checkpoints, you can use the following command: @@ -59,7 +59,7 @@ python src/levanter/main/train_lm.py \ --trainer.checkpointer.save_interval 20m ``` -This will overwrite the default checkpoint settings from the `TrainerConfig` and `CheckpointerConfig` in [checkpoint.py](src/levanter/checkpoint.py) to +This will overwrite the default checkpoint settings from the `TrainerConfig` and `CheckpointerConfig` in [checkpoint.py](https://github.com/stanford-crfm/levanter/tree/main/src/levanter/checkpoint.py) to save checkpoints every 20 minutes. The checkpoint will be saved to the directory `checkpoints/gpt2/${wandb_id}` Note that: @@ -79,7 +79,7 @@ python src/levanter/main/train_lm.py \ --trainer.steps_per_eval 500 ``` -This will overwrite the default eval frequency (every 1,000) from the `TrainerConfig` in [config.py](src/levanter/config.py) to every 500 steps. +This will overwrite the default eval frequency (every 1,000) from the `TrainerConfig` in [config.py](https://github.com/stanford-crfm/levanter/tree/main/src/levanter/config.py) to every 500 steps. ### Change Parallelism Settings By default, Levanter will split the number of examples in `train_batch_size` equally across all available GPUs. @@ -114,7 +114,7 @@ python src/levanter/main/train_lm.py \ --trainer.wandb,group my_new_exp_group ``` -This will overwrite the default WandB configuration from the `TrainerConfig` in [config.py](src/levanter/config.py). +This will overwrite the default WandB configuration from the `TrainerConfig` in [config.py](https://github.com/stanford-crfm/levanter/tree/main/src/levanter/config.py). We pass all these arguments to the `wandb.init()` function at the same verbatim. For more information on the WandB configuration, please refer to the [WandB documentation](https://docs.wandb.ai/ref/python/init). @@ -126,7 +126,7 @@ To do so, you can use the following command: python src/levanter/main/train_lm.py \ --config_path config/gpt2_small.yaml \ --trainer.load_checkpoint_path checkpoints/gpt2/wandb_id \ - --trainer.wandb.resume True \ + --trainer.wandb.resume true \ --trainer.wandb.id asdf1234 ``` diff --git a/docs/Installation.md b/docs/Installation.md index 76d799ee4..4bd37b138 100644 --- a/docs/Installation.md +++ b/docs/Installation.md @@ -8,8 +8,8 @@ end="" %} -If you're using a TPU, more complete documentation for setting that up is available [here](docs/Getting-Started-TPU-VM.md). -If you're using CUDA, more complete documentation for setting that up is available [here](docs/Getting-Started-CUDA.md). +If you're using a TPU, more complete documentation for setting that up is available [here](Getting-Started-TPU-VM.md). +If you're using CUDA, more complete documentation for setting that up is available [here](Getting-Started-CUDA.md). ## Setting up a development environment diff --git a/docs/Levanter-1.0-Release.md b/docs/Levanter-1.0-Release.md index d5a027bfb..41fdce0fc 100644 --- a/docs/Levanter-1.0-Release.md +++ b/docs/Levanter-1.0-Release.md @@ -549,7 +549,7 @@ learn differently from Transformers. To get started, first install the appropriate version of JAX for your system. See [JAX's installation instructions](https://github.com/google/jax/blob/main/README.md#installation) as it varies from platform to platform. -If you're using a TPU, more complete documentation for setting that up is available [here](docs/Getting-Started-TPU-VM.md). GPU support is still in-progress; documentation is available [here](docs/Getting-Started-CUDA.md). +If you're using a TPU, more complete documentation for setting that up is available [here](Getting-Started-TPU-VM.md). GPU support is still in-progress; documentation is available [here](Getting-Started-CUDA.md). Next, clone the repository and install it with pip: diff --git a/docs/css/material.css b/docs/css/material.css new file mode 100644 index 000000000..b73faaf14 --- /dev/null +++ b/docs/css/material.css @@ -0,0 +1,25 @@ +.md-main__inner { + margin-bottom: 1.5rem; +} + +/* Custom admonition: preview */ +:root { + --md-admonition-icon--preview: url('data:image/svg+xml;charset=utf-8,'); +} + +.md-typeset .admonition.preview, +.md-typeset details.preview { + border-color: rgb(220, 139, 240); +} + +.md-typeset .preview>.admonition-title, +.md-typeset .preview>summary { + background-color: rgba(142, 43, 155, 0.1); +} + +.md-typeset .preview>.admonition-title::before, +.md-typeset .preview>summary::before { + background-color: rgb(220, 139, 240); + -webkit-mask-image: var(--md-admonition-icon--preview); + mask-image: var(--md-admonition-icon--preview); +} diff --git a/docs/css/mkdocstrings.css b/docs/css/mkdocstrings.css new file mode 100644 index 000000000..3c0b11eb8 --- /dev/null +++ b/docs/css/mkdocstrings.css @@ -0,0 +1,26 @@ +/* Indentation. */ +div.doc-contents:not(.first) { + padding-left: 25px; + border-left: .05rem solid var(--md-typeset-table-color); +} + +/* Mark external links as such. */ +a.external::after, +a.autorefs-external::after { + /* https://primer.style/octicons/arrow-up-right-24 */ + mask-image: url('data:image/svg+xml,'); + content: ' '; + + display: inline-block; + vertical-align: middle; + position: relative; + + height: 1em; + width: 1em; + background-color: var(--md-typeset-a-color); +} + +a.external:hover::after, +a.autorefs-external:hover::after { + background-color: var(--md-accent-fg-color); +} diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100644 index 000000000..e8b195dd1 --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,9 @@ +mkdocs +mkdocstrings +mkdocstrings-python +mkdocs-material +mkdocs-material-extensions +mkdocs-autorefs +mkdocs-include-markdown-plugin +mkdocs-literate-nav +mkdocs-macros-plugin diff --git a/mkdocs.yml b/mkdocs.yml index 272abdb6c..c368a10d4 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,46 +1,75 @@ -site_name: CRFM Levanter +site_name: Levanter repo_url: https://github.com/stanford-crfm/levanter/ edit_uri: blob/main/docs/ theme: - name: readthedocs + name: material highlightjs: false + features: + - content.code.copy +markdown_extensions: +- attr_list +- admonition +#- callouts +- footnotes +- codehilite +- pymdownx.details # Allowing hidden expandable regions denoted by ??? +- pymdownx.magiclink +- pymdownx.superfences +- pymdownx.arithmatex: # Render LaTeX via MathJax + generic: true +- pymdownx.superfences # Seems to enable syntax highlighting when used with the Material theme. +- pymdownx.snippets: # Include one Markdown file into another + base_path: docs +- pymdownx.inlinehilite +- pymdownx.snippets: + check_paths: true +- pymdownx.superfences +- toc: + permalink: "¤" + toc_depth: "2-3" + plugins: - search + - autorefs - mkdocstrings: handlers: python: + setup_commands: + - import pytkdocs_tweaks + - pytkdocs_tweaks.main() + paths: [src] + import: + - https://docs.python.org/3/objects.inv + - https://jax.readthedocs.io/en/latest/objects.inv + - https://docs.kidger.site/equinox/objects.inv options: - show_root_heading: true - show_signature_annotations: true - show_bases: false - show_source: false - show_root_full_path: false - show_if_no_docstring: true - members_order: source - merge_init_into_class: true docstring_options: ignore_init_summary: true - + docstring_style: google + show_source: false + docstring_section_style: list + heading_level: 5 + inherited_members: true + merge_init_into_class: true + load_external_modules: true + preload_modules: [haliax, haliax.core] +# separate_signature: true + show_root_heading: true + show_root_full_path: false +# show_signature_annotations: true + show_symbol_type_heading: false + show_symbol_type_toc: false + signature_crossrefs: true + line_length: 100 - include-markdown extra_css: - - docstrings.css -markdown_extensions: - - pymdownx.magiclink - - pymdownx.arithmatex: # Render LaTeX via MathJax - generic: true - - pymdownx.superfences # Seems to enable syntax highlighting when used with the Material theme. - - pymdownx.details # Allowing hidden expandable regions denoted by ??? - - pymdownx.snippets: # Include one Markdown file into another - base_path: docs - - admonition - - toc: - permalink: "¤" # Adds a clickable permalink to each section heading - toc_depth: 4 + - css/material.css + - css/mkdocstrings.css + + watch: - src - - scripts - - infra - - config + - docs nav: - 'Home': 'index.md' - 'User Guide': 'Getting-Started-Training.md' @@ -49,6 +78,5 @@ nav: - 'Installation.md' - 'Getting-Started-TPU-VM.md' - 'Getting-Started-CUDA.md' - - Technical Documentation: - - 'Overview.md' - - 'design/Data-Loader-Design.md' + - Other: + - 'Levanter-1.0-Release.md'