You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high.
A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.
This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.
This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.
“ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models”,
What you mentioned maybe be this work
Does anyone know if this work is implemented on llama? Or is there any similar dynamic pruning work on llama?
The text was updated successfully, but these errors were encountered: