Skip to content

Commit

Permalink
readme
Browse files Browse the repository at this point in the history
  • Loading branch information
lucidrains authored May 16, 2023
1 parent bce8b32 commit 6a684da
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

## MEGABYTE - Pytorch

Implementation of <a href="https://arxiv.org/abs/2305.07185">MEGABYTE</a>, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
Implementation of <a href="https://arxiv.org/abs/2305.07185">MEGABYTE</a>, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch. Took the liberty to generalize it even further so one can have multiple local models.

<a href="https://github.com/lucidrains/simple-hierarchical-transformer">Similar independent research</a>
<a href="https://github.com/lucidrains/simple-hierarchical-transformer">Similar independent research that is a further generalization</a>

## Appreciation

Expand All @@ -25,8 +25,8 @@ from MEGABYTE_pytorch import MEGABYTE
model = MEGABYTE(
num_tokens = 16000, # number of tokens
dim = 512, # transformer model dimension
max_seq_len = (1024, 4), # sequence length for global and then local
depth = (6, 4), # number of layers for global and then local
max_seq_len = (1024, 4), # sequence length for global and then local. this can be more than 2
depth = (6, 4), # number of layers for global and then local. this can be more than 2, but length must match the max_seq_len's
dim_head = 64, # dimension per head
heads = 8, # number of attention heads
flash_attn = True # use flash attention
Expand Down

0 comments on commit 6a684da

Please sign in to comment.