Implementation of Samba by Microsoft in PyTorch.
This aims to be a simpler implementation of the original repo.
Tip
The pip install command should install all dependencies and the package, but some CUDA-heavy dependencies are better installed separately. See below for more details.
git clone https://github.com/pszemraj/samba-pytorch.git
cd samba-pytorch
pip install -e .
After installing torch
, xformers
, and flash-attn
, you may want to install mamba-ssm
, causal-conv1d
, and fla
from source:
pip install --upgrade pip ninja
pip install git+https://github.com/state-spaces/mamba.git --no-build-isolation
pip install git+https://github.com/Dao-AILab/causal-conv1d.git --no-build-isolation
pip install git+https://github.com/sustcsonglin/flash-linear-attention@98c176e --no-build-isolation
Then, clone this repo and run commands as above.
A basic example of creating a random model from a named config:
from samba_pytorch import Config, GPT
cfg = Config.from_name('Samba_421M_1k_window')
print*(cfg)
model = GPT(cfg)
model
A minimalist training script for a character-level language model on enwiki8:
python train.py
Credit to nGPT-pytorch for the enwik8 data set and the training script, which has been adapted for this repo.
samba-pytorch/
βββ pyproject.toml
βββ README.md
βββ samba_pytorch/
β βββ __init__.py
β βββ config.py
β βββ modules/
β β βββ __init__.py
β β βββ fused_rotary_embedding.py
β β βββ gla.py
β β βββ mamba_simple.py
β β βββ multiscale_retention.py
β β βββ rmsnorm.py
β β βββ rotary.py
β βββ samba.py
β βββ tokenizer.py
β βββ utils.py
@article{ren2024samba,
title={Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling},
author={Liliang Ren and Yang Liu and Yadong Lu and Yelong Shen and Chen Liang and Weizhu Chen},
journal = {arXiv preprint},
year={2024},
url={https://arxiv.org/abs/2406.07522}
}