Skip to content

Commit

Permalink
Add note that SDPA is disabled for non None mask or softcapping
Browse files Browse the repository at this point in the history
  • Loading branch information
Andrei-Aksionov committed Dec 26, 2024
1 parent 311c2c5 commit 5520aef
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions litgpt/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,7 @@ def forward(
mask += sliding_window_bias

# Efficient attention using Flash Attention CUDA kernels.
# NOTE: efficient implementation is disabled if `mask` is not None or softcapping is enabled.
# ↓ (B, nh, T, hs) @ (B, nh, T, hs).mT --> (B, nh, T, T) @ (B, nh, T, hs) --> (B, nh, T, hs)
y = self.scaled_dot_product_attention(q, k, v, mask)

Expand Down

0 comments on commit 5520aef

Please sign in to comment.