-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Marginal
The marginal reduction implements a technique which is similar to how multi-armed bandits operate. For each given id a feature is maintained where the value of this feature is updated as follows:
numerator = numerator * (1.0 - decay) + (label * weight)
denominator = denominator * (1.0 - decay) + weight
This allows you to track the value of a given id or arm based on the rewards it has received. This works without any contextual information.
Marginal Options:
--marginal arg Substitute marginal label estimates for ids
--initial_denominator arg (=1, ) Initial denominator
--initial_numerator arg (=0.5, ) Initial numerator
--compete Enable competition with marginal features
--update_before_learn Update marginal values before learning
--unweighted_marginals Ignore importance weights when computing
marginals
--decay arg (=0, ) Decay multiplier per event (1e-3 for
example)
The reduction is enabled with --marginal <namespace>
. The given namespace is the first character of the namespace where the marginal features are located. More than one marginal namespace is supported.
The marginal namespace needs to be carefully constructed as it is interpeted in a specific way. It should contain 1 or more pairs of features. The first feature in the pair is what the feature index of the marginal feature will be. Below is an example VW data file with 4 lines and 3 different ids. Notice that in the data file every line uses the same value for the first feature.
0.5 |m constant id1
1.0 |m constant id2
0.25 |m constant id3
0.4 |m constant id1
If we train on this file.
vw --marginal m -d <data> --noconstant --readable_model readable.txt
We can inspect the readable model to see the calculated marginals. The marginal triplets are after marginals size = x
and before :0
. The triplets are hash,numerator,denominator
.
Readable model output:
Version 8.11.0
Id
Min label:0
Max label:1
bits:18
lda:0
0 ngram:
0 skip:
options: --marginal m
Checksum: 1964076403
marginals size = 3
262109:0.75:2
134578:1.5:2
251020:1.4:3
:0
m^constant:6788:0.877014
Since we used the default initial numerator of 0.5
and denominator of 1
we can see the counts add up. For example for id1
numerator = 0.5 + 0.5 + 0.4
= 1.4
denominator = 1 + 1 +1
= 3
Since we used constant
for each of the lines, only a single model weight was learned.
Let's say we were to make a prediction with this model on this example:
| constant id2
We would expect the prediction to be:
= 1.5/2 * 0.877014
= 0.658
Of course, this is a simple example so that it is easy to calculate. The marginal feature can be applied to a larger system with non-marginal features in order to learn a specific feature which directly corresponds to the value of a given id or arm.
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: