Use the automatically updated alpha endmax, where alpha is updated to a value less than 1 or a negative value #39

1207382225 · 2024-12-18T17:46:49Z

self.alpha = torch.nn.Parameter(torch.tensor(1.33))
attention_probs = entmax_bisect(attention_scores, alpha=self.alpha, dim=-1)

I directly used the Adamw optimizer for backpropagation and found that the value of output a kept decreasing and was less than 1.
May I ask if I used the entmax method incorrectly？

bpopeters · 2024-12-19T15:53:07Z

Hello,

This usage is not incorrect per se, but it might not get you exactly what you're looking for. In general, it's useful to apply some constraint on what values alpha can take. In Correia et al., 2019, they parameterized it essentially like this:

self.z = torch.nn.Parameter(torch.tensor(0.0))
alpha = torch.sigmoid(self.z) +1
attention_probs = entmax_bisect(attention_scores, alpha=alpha, dim=-1)

This guarantees that the alpha value will always be on the interval (1, 2) (in other words, somewhere between softmax and sparsemax). In principle you could constrain it in other ways as well -- to my knowledge, no one has explored this in much depth.

Hope that helps,
Ben

1207382225 · 2024-12-21T09:50:52Z

thank you! Your answer is very helpful for me in interdisciplinary research.
Does alpha have a significant impact on sparsity? I asked alpha to update with the same optimizer using the same learning rate for all parameters, and found that the amplitude of alpha changes between 0.002-0.1. I am considering whether to set a separate learning rate for alpha so that it can take big strides.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the automatically updated alpha endmax, where alpha is updated to a value less than 1 or a negative value #39

Use the automatically updated alpha endmax, where alpha is updated to a value less than 1 or a negative value #39

1207382225 commented Dec 18, 2024

bpopeters commented Dec 19, 2024

1207382225 commented Dec 21, 2024

Use the automatically updated alpha endmax, where alpha is updated to a value less than 1 or a negative value #39

Use the automatically updated alpha endmax, where alpha is updated to a value less than 1 or a negative value #39

Comments

1207382225 commented Dec 18, 2024

bpopeters commented Dec 19, 2024

1207382225 commented Dec 21, 2024