-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variational inference #27
Comments
What are your thoughts on what should be returned by variational inference? I imagine we want to return the actual variational program (rather than summarise it with samples for example) but it's not obvious to me how to do that. It seems like we'd need to reach into the thunk we're passed and set its ERP parameters to those found during inference. This doesn't seem straight forward. In Stochy, I return a function which runs the original program with a special co-routine which switches in the variational parameters at run-time. This is pretty ugly, I've not convinced myself it's a fully general, and it doesn't support the ERP interface. Do you have any better ideas? |
I've started to clean-up/test what we have so far. See my variational branch. Here's a simple test case I'm working with. We already get close to the optimal parameters as found by the hand-derived variational inference algorithm. |
Awesome! The question of what kind of ERP representation to return is a good and tricky one. I think as a simple first-pass, returning an empirical distribution built from samples is ok. The ERP object should have extra fields for best variational parameters and the corresponding (estimated) variational lower-bound on marginal likelihood. As you say, a better representation of the But even with the variational sampler, there's still the question of how to implement the Anyhow, it requires more thought! But a lot of the things we'd want to do with variational can already be done with the simple solutions. |
Great, thanks! I'm now returning the estimated lower-bound and an ERP built from samples. See the updated test case. I've also implemented the control variate idea from "Black Box Variational Inference". I think it's working but I need to test it more thoroughly. |
The returned ERP now has a Also, is this to do note still relevant? (The code looks ok to me.) I've also added inference tests and made a few other tweaks. This is all in the variational branch. |
We've made a lot of progress in the daipp branch. I think the basic variational infrastructure (ability to specify variational guide distributions, optimization of ELBO via PW+LR estimators, etc) will be ready to merge into dev soon. I am moving this to milestone 0.8 so that we'll have time to test and document before the summer school.... |
btw It would be great if around when this makes it to dev, there is a default for when no guide is given at a (Maybe this was already part of the plan...) |
Yes, I was planning to do this. My intention is to factor the information about parameters and their constraints out of daipp, so that mean-field can also do appropriate parameter squishing. |
Here are the remaining changes I intend to make before opening a PR for this:
Ideally, for simplicity, I'd like to just merge the daipp branch once this is done. The only reason we might prefer not to do this is that it will include a few un-finished and un-tested bits. These are:
I take this to mean we'll add docs for this later. |
sounds good! i think merging daipp into dev is ok -- the extra bits will just remain undocumented until they are done and tested. (perhaps put notes to this effect at the top of the relevant source files...) yes, we can document later (though if you have time to add a stubb to the inference section of docs, that'll get us started). |
Basic black box variational inference (see http://arxiv.org/pdf/1301.1299v1.pdf and http://arxiv.org/pdf/1401.0118v1.pdf) is in the codebase.
It needs to be tested and benchmarked.
There are several major performance improvements described in the papers that need to be implemented. Most important rao-blackwellization of the gradient estimates. This may require a flow analysis to determine the markov blanket of random choices.
Once everything is working, try variationally-guided PF: in the particle filter, sample new choices from the variational distribution, instead of the prior. Or possibly mix / interpolate prior and variational distribution. The idea is that variational gets you an importance sampler closer to the posterior modes, while PF helps capture the joint structure ignored by variational.
The text was updated successfully, but these errors were encountered: