Contextual Bandit Off-Policy Evaluation #791

vitorkrasniqi · 2022-10-27T13:27:11Z

Hi,

I am currently dealing with "agents/tf_agents/bandits/" . I am wondering where or if the classic Contextual Bandit off-policy evaluation procedures are present in Tensorflow.I mean exactly the following off-policy evaluation procedures:

Direct Method
Inverse Probability Weighting (IPW)
Doubly Robust (DR) / also known as Augmented IPW

I mean the evaluation procedures that vowpal_wabbit already uses. Can be found here:
https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/python_Contextual_bandits_and_Vowpal_Wabbit.html

Or even more desirable, methods which we can find at the package Open Bandit Pipeline:
https://github.com/st-tech/zr-obp

Before I start thinking about how to integrate the methods from obp in the tensorflow environment, I would like to know if and where these methods can be found at TF Agents.

vitorkrasniqi · 2022-11-02T10:24:39Z

It is currently not available.

SamanthaSHan · 2022-12-14T23:47:39Z

Did you end up implementing yourself? Curious if you found any solutions to this

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contextual Bandit Off-Policy Evaluation #791

Contextual Bandit Off-Policy Evaluation #791

vitorkrasniqi commented Oct 27, 2022

vitorkrasniqi commented Nov 2, 2022

SamanthaSHan commented Dec 14, 2022

Contextual Bandit Off-Policy Evaluation #791

Contextual Bandit Off-Policy Evaluation #791

Comments

vitorkrasniqi commented Oct 27, 2022

vitorkrasniqi commented Nov 2, 2022

SamanthaSHan commented Dec 14, 2022