-
I'm trying to run the FasterTransformer code on our own T5 model. When building the
Thanks for the help in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
It is
The intermediate size of Feed Forward Network. |
Beta Was this translation helpful? Give feedback.
It is
size_per_head
in encoder, or called byhidden_size
in some cases.The intermediate size of Feed Forward Network.