-
Notifications
You must be signed in to change notification settings - Fork 728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding New Action(s) to a Bandit Policy #670
Comments
Hi David,
This way only the first 5 actions will be eligible at any time. Note that the double bracket is for the batch dimension, and if you have batch_size>1 then you have to modify this output accordingly. |
@bartokg, this makes sense! Thanks a lot. |
@bartokg As I understand looking into the code that for per-arm implementation of Linucb we have just single /theta to maintain whereas in linucb paper we have \theta for every arm. Can you justify the reasoning behind or point the relevant paper that supports this? |
I understand from the Per-Arm Features tutorial that it may be "cumbersome to add" a new action to a policy, but what is the procedure for doing do?
For example, if I have a LinUCB agent that is trained with 5 candidate actions, but over time I'd like to add a 6th action candidate, how would I do so?
The text was updated successfully, but these errors were encountered: