Skip to content

building various transformer model architectures and its modules from scratch.

Notifications You must be signed in to change notification settings

shreydan/scratchformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ScratchFormers

implementing transformers from scratch.

Attention is all you need.

Modules

Models

  • LlaMA

  • simple Vision Transformer

  • GPT2

  • OpenAI CLIP

    • implemented ViT-B/32 variant
    • for process, check building_clip.ipynb
    • inference req: install clip for tokenization and preprocessing: pip install git+https://github.com/openai/CLIP.git
    • model implementation
    • zero-shot inference code
    • built in such a way that it supports loading pretrained openAI weights and IT WORKS!!!
    • My lighter implementation of this using existing image and language models trained on Flickr8k dataset is available here: liteCLIP
  • Encoder Decoder Transformer

    • for process, check building_encoder-decoder.ipynb
    • model implementation
    • src_mask for encoder is optional but is nice to have since it is used to mask out the pad tokens so attention is not considered for those tokens.
    • used learned embeddings for position instead of sin/cos as per the OG.
    • I trained a model for multilingual machine translation.
      • Translates english to hindi and telugu.
      • change: single encoder & decoder embedding layer since I used a single tokenizer.
      • for the code and results check: shreydan/multilingual-translation
  • BERT - MLM

    • for process of masked language modeling, check masked-language-modeling.ipynb
    • model implementation
    • simplification: for pre-training no use of [CLS] & [SEP] tokens since I only built the model for masked language modeling and not for next sentence prediction.
    • I trained an entire model on the wikipedia dataset, more info in shreydan/masked-language-modeling repo.
    • once, pretrained the MLM head can be replaced with any other downstream task head.
  • ViT MAE

  • UNETR

    • 3D segmentation model for medical domain
    • Transformer based architecture, more info
    • process: building_unetr

Requirements

einops
torch
torchvision
numpy
matplotlib
pandas

Here's my puppy's picture: sumo


God is our refuge and strength, a very present help in trouble.
Psalm 46:1

About

building various transformer model architectures and its modules from scratch.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published