Activity
add dynamic LIMe from Gerasimov et al., making sure it is compatible …
add dynamic LIMe from Gerasimov et al., making sure it is compatible …
Force push
add dynamic LIMe from Gerasimov et al., making sure it is compatible …
add dynamic LIMe from Gerasimov et al., making sure it is compatible …
fix for dynamic pos bias during inference
fix for dynamic pos bias during inference
demonstrate hybridization with a gru that only acts every 4 tokens ca…
demonstrate hybridization with a gru that only acts every 4 tokens ca…
the rotateable subhead keys from MLA needs to be cached
the rotateable subhead keys from MLA needs to be cached
in multi latent attention, cache the lightweight latent kv
in multi latent attention, cache the lightweight latent kv
remove resiDual, as hyperconnections is the culmination for that line…
remove resiDual, as hyperconnections is the culmination for that line…
Force push
remove resiDual, as hyperconnections is the culmination for that line…
remove resiDual, as hyperconnections is the culmination for that line…
remove some unpopular features / research, and prepare to incorporate…
remove some unpopular features / research, and prepare to incorporate…
allow for queries, keys, values to be derived from different combinat…
allow for queries, keys, values to be derived from different combinat…
Force push
allow for queries, keys, values to be derived from different combinat…
allow for queries, keys, values to be derived from different combinat…
allow each token to decide how much of input to reinject
allow each token to decide how much of input to reinject
if the hybrid module is an RNN, allow for folding it across the seque…
if the hybrid module is an RNN, allow for folding it across the seque…
flexibly handle hybrid module outputs
flexibly handle hybrid module outputs
Force push