Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the automatically updated alpha endmax, where alpha is updated to a value less than 1 or a negative value #39

Open
1207382225 opened this issue Dec 18, 2024 · 2 comments

Comments

@1207382225
Copy link

self.alpha = torch.nn.Parameter(torch.tensor(1.33))
attention_probs = entmax_bisect(attention_scores, alpha=self.alpha, dim=-1)

I directly used the Adamw optimizer for backpropagation and found that the value of output a kept decreasing and was less than 1.
May I ask if I used the entmax method incorrectly?

@bpopeters
Copy link
Collaborator

Hello,

This usage is not incorrect per se, but it might not get you exactly what you're looking for. In general, it's useful to apply some constraint on what values alpha can take. In Correia et al., 2019, they parameterized it essentially like this:

self.z = torch.nn.Parameter(torch.tensor(0.0))
alpha = torch.sigmoid(self.z) +1
attention_probs = entmax_bisect(attention_scores, alpha=alpha, dim=-1)

This guarantees that the alpha value will always be on the interval (1, 2) (in other words, somewhere between softmax and sparsemax). In principle you could constrain it in other ways as well -- to my knowledge, no one has explored this in much depth.

Hope that helps,
Ben

@1207382225
Copy link
Author

thank you! Your answer is very helpful for me in interdisciplinary research.
Does alpha have a significant impact on sparsity? I asked alpha to update with the same optimizer using the same learning rate for all parameters, and found that the amplitude of alpha changes between 0.002-0.1. I am considering whether to set a separate learning rate for alpha so that it can take big strides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants