Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding New Action(s) to a Bandit Policy #670

Open
davidcereal opened this issue Oct 22, 2021 · 3 comments
Open

Adding New Action(s) to a Bandit Policy #670

davidcereal opened this issue Oct 22, 2021 · 3 comments

Comments

@davidcereal
Copy link

I understand from the Per-Arm Features tutorial that it may be "cumbersome to add" a new action to a policy, but what is the procedure for doing do?

For example, if I have a LinUCB agent that is trained with 5 candidate actions, but over time I'd like to add a 6th action candidate, how would I do so?

@bartokg
Copy link
Member

bartokg commented Oct 25, 2021

Hi David,
The best option you have is to add "blank" actions and enable them later. To do so:

  1. Estimate how many actions you will want to add later. For the sake of this example, let's say it's 5. Then, you define your agent with 5+5=10 actions.
  2. When initializing the agent, add the the parameter observation_and_action_constraint_splitter. This should be a function that, when presented with an observation, spits out the actual context and a binary action mask. For now the easiest thing would be:
def splitter(obs):
  return (obs, [[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]])

This way only the first 5 actions will be eligible at any time. Note that the double bracket is for the batch dimension, and if you have batch_size>1 then you have to modify this output accordingly.
3. To enable a new action, just save the model variables, and initialize a new agent with those variables and a new splitter that allows the 6th action. If you want to enable actions in a timely manner, you can also add the a loop parameter in the action mask.
Let me know if that helps!
Gabor

@davidcereal
Copy link
Author

@bartokg, this makes sense! Thanks a lot.

@sj31867
Copy link

sj31867 commented Jul 6, 2023

@bartokg As I understand looking into the code that for per-arm implementation of Linucb we have just single /theta to maintain whereas in linucb paper we have \theta for every arm. Can you justify the reasoning behind or point the relevant paper that supports this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants