Adding New Action(s) to a Bandit Policy #670

davidcereal · 2021-10-22T19:19:50Z

I understand from the Per-Arm Features tutorial that it may be "cumbersome to add" a new action to a policy, but what is the procedure for doing do?

For example, if I have a LinUCB agent that is trained with 5 candidate actions, but over time I'd like to add a 6th action candidate, how would I do so?

bartokg · 2021-10-25T08:13:37Z

Hi David,
The best option you have is to add "blank" actions and enable them later. To do so:

Estimate how many actions you will want to add later. For the sake of this example, let's say it's 5. Then, you define your agent with 5+5=10 actions.
When initializing the agent, add the the parameter observation_and_action_constraint_splitter. This should be a function that, when presented with an observation, spits out the actual context and a binary action mask. For now the easiest thing would be:

def splitter(obs):
  return (obs, [[1, 1, 1, 1, 1, 0, 0, 0, 0, 0]])

This way only the first 5 actions will be eligible at any time. Note that the double bracket is for the batch dimension, and if you have batch_size>1 then you have to modify this output accordingly.
3. To enable a new action, just save the model variables, and initialize a new agent with those variables and a new splitter that allows the 6th action. If you want to enable actions in a timely manner, you can also add the a loop parameter in the action mask.
Let me know if that helps!
Gabor

davidcereal · 2021-11-02T18:18:24Z

@bartokg, this makes sense! Thanks a lot.

sj31867 · 2023-07-06T05:42:22Z

@bartokg As I understand looking into the code that for per-arm implementation of Linucb we have just single /theta to maintain whereas in linucb paper we have \theta for every arm. Can you justify the reasoning behind or point the relevant paper that supports this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding New Action(s) to a Bandit Policy #670

Adding New Action(s) to a Bandit Policy #670

davidcereal commented Oct 22, 2021

bartokg commented Oct 25, 2021

davidcereal commented Nov 2, 2021

sj31867 commented Jul 6, 2023

Adding New Action(s) to a Bandit Policy #670

Adding New Action(s) to a Bandit Policy #670

Comments

davidcereal commented Oct 22, 2021

bartokg commented Oct 25, 2021

davidcereal commented Nov 2, 2021

sj31867 commented Jul 6, 2023