Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variational inference #27

Closed
ngoodman opened this issue Jan 8, 2015 · 10 comments
Closed

Variational inference #27

ngoodman opened this issue Jan 8, 2015 · 10 comments
Assignees
Milestone

Comments

@ngoodman
Copy link
Contributor

ngoodman commented Jan 8, 2015

Basic black box variational inference (see http://arxiv.org/pdf/1301.1299v1.pdf and http://arxiv.org/pdf/1401.0118v1.pdf) is in the codebase.

It needs to be tested and benchmarked.

There are several major performance improvements described in the papers that need to be implemented. Most important rao-blackwellization of the gradient estimates. This may require a flow analysis to determine the markov blanket of random choices.

Once everything is working, try variationally-guided PF: in the particle filter, sample new choices from the variational distribution, instead of the prior. Or possibly mix / interpolate prior and variational distribution. The idea is that variational gets you an importance sampler closer to the posterior modes, while PF helps capture the joint structure ignored by variational.

@null-a
Copy link
Member

null-a commented Apr 20, 2015

What are your thoughts on what should be returned by variational inference? I imagine we want to return the actual variational program (rather than summarise it with samples for example) but it's not obvious to me how to do that. It seems like we'd need to reach into the thunk we're passed and set its ERP parameters to those found during inference. This doesn't seem straight forward.

In Stochy, I return a function which runs the original program with a special co-routine which switches in the variational parameters at run-time. This is pretty ugly, I've not convinced myself it's a fully general, and it doesn't support the ERP interface. Do you have any better ideas?

@null-a
Copy link
Member

null-a commented Apr 20, 2015

I've started to clean-up/test what we have so far. See my variational branch.

Here's a simple test case I'm working with. We already get close to the optimal parameters as found by the hand-derived variational inference algorithm.

@ngoodman
Copy link
Contributor Author

Awesome!

The question of what kind of ERP representation to return is a good and tricky one. I think as a simple first-pass, returning an empirical distribution built from samples is ok. The ERP object should have extra fields for best variational parameters and the corresponding (estimated) variational lower-bound on marginal likelihood.

As you say, a better representation of the sample() method for this ERP would be the variational program with it's params fixed. The coroutine trick is a clever and not terrible way to do this (only drawback i see is speed and slight kludginess). The other ways that i can think of all involve reflecting into the source code of the thunk and building a new (minimal) sampler program.

But even with the variational sampler, there's still the question of how to implement the score() method.... This could be tricky if there is a lot of deterministic computation between the random choices and the return value of the marginalized thunk.

Anyhow, it requires more thought! But a lot of the things we'd want to do with variational can already be done with the simple solutions.

@null-a
Copy link
Member

null-a commented Apr 23, 2015

Great, thanks!

I'm now returning the estimated lower-bound and an ERP built from samples. See the updated test case.

I've also implemented the control variate idea from "Black Box Variational Inference". I think it's working but I need to test it more thoroughly.

@null-a
Copy link
Member

null-a commented May 12, 2015

The returned ERP now has a variationalParams field which is an object mapping addresses to vectors of ERP parameters. Is that what you had in mind?

Also, is this to do note still relevant? (The code looks ok to me.)

I've also added inference tests and made a few other tweaks. This is all in the variational branch.

@null-a null-a removed their assignment Jul 1, 2015
@null-a null-a modified the milestone: v0.9 Feb 12, 2016
@ngoodman ngoodman modified the milestones: v0.8, v0.9 May 7, 2016
@ngoodman
Copy link
Contributor Author

ngoodman commented May 7, 2016

We've made a lot of progress in the daipp branch. I think the basic variational infrastructure (ability to specify variational guide distributions, optimization of ELBO via PW+LR estimators, etc) will be ready to merge into dev soon. I am moving this to milestone 0.8 so that we'll have time to test and document before the summer school....

@null-a null-a self-assigned this May 28, 2016
@ngoodman
Copy link
Contributor Author

ngoodman commented Jun 7, 2016

btw It would be great if around when this makes it to dev, there is a default for when no guide is given at a sample within Optimize. E.g. do mean-field by taking the guide at each sample to be the same Dist with the dist params upgraded to guide params. (I guess it would be good to print a "warning: defaulting to mean-field" to the console in this case.)

(Maybe this was already part of the plan...)

@null-a
Copy link
Member

null-a commented Jun 8, 2016

do mean-field by taking the guide at each sample to be the same Dist with the dist params upgraded to guide params

Yes, I was planning to do this. My intention is to factor the information about parameters and their constraints out of daipp, so that mean-field can also do appropriate parameter squishing.

@null-a
Copy link
Member

null-a commented Jun 8, 2016

Here are the remaining changes I intend to make before opening a PR for this:

  • add daipp license (mit?) now Add license webppl-daipp#1
  • doc stubs
  • mean-field when no guide is given in the program
  • add more inference tests for vi
  • add basic (sanity check) tests for new distributions
  • move daipp to a package
  • check guides/params are ignored by coroutines that don't use them

Ideally, for simplicity, I'd like to just merge the daipp branch once this is done. The only reason we might prefer not to do this is that it will include a few un-finished and un-tested bits. These are:

  • EUBO gradient estimator
  • The ability to get traces out of SMC
  • The mapData construct
  • evaluateGuide (computes ESS for a guide)

I am moving this to milestone 0.8 so that we'll have time to test and document before the summer school....

I take this to mean we'll add docs for this later.

@ngoodman
Copy link
Contributor Author

ngoodman commented Jun 8, 2016

sounds good! i think merging daipp into dev is ok -- the extra bits will just remain undocumented until they are done and tested. (perhaps put notes to this effect at the top of the relevant source files...)

yes, we can document later (though if you have time to add a stubb to the inference section of docs, that'll get us started).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants