You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The nn_pruning tool remove entire heads in attention and entire rows/columns in feed forward networks. The remaining heads are then pretty dense, and the feed forward networks are completely dense after row/column removal.
That means that pytorch_block_sparse is not fast enough for this slightly sparse network to be competitive with very efficient standard dense linear algebra kernels: there are not enough zeros for pytorch_block_sparse to be competitive, so just using standard pytorch functions is faster.
In the README.md, why did you say that "it's not needed to run the models pruned by the nn_pruning tools"?
The text was updated successfully, but these errors were encountered: