Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variational inference #494

Merged
merged 253 commits into from
Jun 17, 2016
Merged

Variational inference #494

merged 253 commits into from
Jun 17, 2016

Conversation

null-a
Copy link
Member

@null-a null-a commented Jun 13, 2016

I think we're good to merge the daipp branch. (Closes #27.) Here's a summary of what's included in this PR:

  • Optimize coroutine for optimizing guide params using either elbo or eubo.
  • Guides specified with sample(targetDist, {guide: guideDist}).
  • Mean-field guides as default.
  • Helpers for creating/registering parameters: scalarParam, tensorParam. Also still have param which dispatches to the scalar or tensor version based on its first arg.
  • Guides now used to specify importance distributions for SMC. withImportanceDist has been removed. (Partially addressing Unify treatment of importance distributions and custom MH proposers #119.)
  • ForwardSample coroutine to sample from guide or target program. (Closes Have an "inference" algorithm that just returns forward samples #368.)
  • Infer({method: 'optimize'}, model) as a convenient way of doing the common thing of doing some optimization and then sampling to make a marginal.
  • New saveTraces option on SMC as a way to generate traces for eubo estimator.
  • New distributions: TensorGaussian, DiagCovGaussian, LogisticNormal, MultivariateBernoulli.
  • Existing distributions updated to take tensors: Dirichlet, MultivariateGaussian, Discrete. (Discrete can still also take arrays, the others take only tensors.)
  • Helpers for working with tensors.
  • Work-in-progress mapData construct.
  • Small change to Delta. The only place where Delta is used (that I know of) is agent models, and I don't think this change will have any impact there.

null-a and others added 30 commits March 21, 2016 19:14
* Adds wppl method getRelativeAddress.
* Rejuvenation (MH only) can now be used when generating initial traces
  for tutorial training.
This code supported pretty printing optimization method instances.
The core primitive for registering parameters is now `registerParams`.
This is more flexible than its predecessor in that it tracks arrays of
parameters per name/address, and can optionally call a hook once
parameters have been lifted.
@ngoodman
Copy link
Contributor

this is exciting!

are there any breaking changes in this PR, or is it just additions? (seems like maybe signatures of some distributions have changed in a way that will break existing code?)

@ngoodman
Copy link
Contributor

a suggestion: can we change the name of the naive VI with mean field optimize then sample call to Infer({method: 'variational'}, model)?

@null-a
Copy link
Member Author

null-a commented Jun 13, 2016

are there any breaking changes in this PR, or is it just additions?

The only breaking changes (I hope) are to those distributions that now take a tensor rather than an array or nested array. i.e. Dirichlet and MultivariateGaussian.

Attempting to create either of those using arrays throws a useful error message.

a suggestion: can we change the name of the naive VI with mean field optimize then sample call to Infer({method: 'variational'}, model)?

As implemented, Infer({method: 'optimize'}, model) doesn't necessarily optimize the elbo which is why I didn't call it variational. For example something like Infer({method: 'optimize', estimator: {EUBO: traces: traces}}, model) would work.

Also, just to be clear, this only does mean-field when no guide is specified.

We can of course add the thing you suggested, but we'll have to do some extra work to massage the parameters into the format expected by Optimize.

This massaging makes the interfaces a little less consistent (which then makes it harder to use, more fiddly to document) which is another reason I went the way I did:

Infer({method: 'optimize', samples: 1000, estimator: {ELBO: {samples: 10}}}, model)

// If you want to start doing something fancy, you might start by
// writing the above as something like:
Optimize(model, {estimator: {ELBO: {samples: 10}}})
SampleGuide(model, {samples: 1000})

// (The fact that Optimize and Infer both take a method parameter 
// spoils this a little. That could easily be fixed.)

versus:

Infer({method: 'variational', samples: 1000, elboSamples: 10}, model)
// this is a little trickier =>
Optimize(model, {estimator: {ELBO: {samples: 10}}})
SampleGuide(model, {samples: 1000})

So that's what I had in mind, what do you think?

@ngoodman
Copy link
Contributor

that's reasonable... maybe make an additional method variational that does ELBO and uses reasonable default params? we could wait on this to see what we want when we are using / teaching this.

@null-a
Copy link
Member Author

null-a commented Jun 14, 2016

maybe make an additional method variational that does ELBO and uses reasonable default params?

I'm totally on board with the idea of having a slightly more convenient interface for this, I'm just not at all sure what it should be. I like the default params idea but I wonder whether having the option to vary the number of samples per step might turn out to be something we often need, especially while we don't have baselines.

we could wait on this to see what we want when we are using / teaching this.

Given my uncertainty I'd be happy to see how things go, things are likely to be clearer once people start using it. If someone wants to make a call, that's also fine with me. Either way, I don't think it needs to hold this PR up.

@stuhlmueller
Copy link
Member

Existing distributions updated to take tensors: Dirichlet, MultivariateGaussian, Discrete. (Discrete can still also take arrays, the others take only tensors.)

Why make this distinction? (As a user, I'd assume that either both Dirichlet and Discrete take arrays, or neither.)

function eqDim0(v, w) {
// Useful for checking two vectors have the same length, or that the
// dimension of a vector and matrix match.
return v.dims[0] === w.dims[0];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the functions above somewhere else (to tensor.js, or util.js)?

@null-a
Copy link
Member Author

null-a commented Jun 15, 2016

Why make this distinction? (As a user, I'd assume that either both Dirichlet and Discrete take arrays, or neither.)

First, I should say that it's not obvious to me what the optimal thing to do here is, and there may well be other good or better choices, but here's what I was thinking:

Having the Discrete take either an array or a tensor seems suited to its use as either a prior or as a likelihood respectively. (Roughly speaking.) In the first case arrays seem convenient and simple (there may be no other tensors in the model, arrays everywhere is nice) in the second we need to take a tensor when the prior is e.g. a Dirichlet.

Something similar doesn't seem to hold for the Dirichlet. The value sampled from the Dirichlet is a tensor, so specifying the parameters as a tensor doesn't seem like a big deal.

Existing distributions updated to take tensors: Dirichlet, MultivariateGaussian, Discrete. (Discrete can still also take arrays, the others take only tensors.)

Re-reading this I've noticed that it glosses over a potentially more important issue: the values sampled from Dirichlet and MultivariateGaussian etc. are now tensors rather than arrays. I'm sure you realize this, but I thought I'd point it out just in case I sent us off in the wrong direction with my emphasis on the type of the parameters.

@stuhlmueller
Copy link
Member

stuhlmueller commented Jun 15, 2016

Re-reading this I've noticed that it glosses over a potentially more important issue: the values sampled from Dirichlet and MultivariateGaussian etc. are now tensors rather than arrays.

That's a significant backwards-incompatibility—many models that use Dirichlet may need to be adapted. Let's send an announcement to webppl-dev once this PR makes it to npm.

It's unfortunate that this change makes code a bit less readable/intuitive, going from

var xs = dirichlet([1, 1]); 
var x = xs[0];

to

var xs = dirichlet([1, 1]); 
var x = T.get(xs, 0);

but I don't see a good way around this. (We could macro-rewrite all list indexing to a call like T.get and make sure that the get function also works for arrays, but that would incur some overhead for plain array accesses.)

var mapDataIndices = {};

// This is been developed as part of daipp. It's probably still
// buggy.
Copy link
Member

@stuhlmueller stuhlmueller Jun 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to include mapData as part of a separate PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to go either way on this, so I've removed it.

@null-a
Copy link
Member Author

null-a commented Jun 16, 2016

That's a significant backwards-incompatibility

Indeed. What's more I suspect that the effects of introducing tensors that are not as deeply embedded in the language as arrays will be felt for a while.

Our current situation is perhaps similar to the early days of python+numpy, so I expect it will take some effort to get to something as nice as their current setup.

Having multidimensional arrays everywhere (like e.g. Julia) is an appealing long-term solution, but whether that's achievable...

It's unfortunate that this change makes code a bit less readable/intuitive

Do you think there'll be much call for indexing tensors drawn from Dirichlets? Are the things you have in mind expressible using tensor operations, T.add, T.sumreduce, etc. If not, are there functions we could add that would make it possible?

but I don't see a good way around this

You'll have thought of this I'm sure, but for the record other options include:

  • We could implement both array and tensor valued versions of each distribution.
  • We could implement multivariate distributions in terms of tensors, but automatically convert from tensors to arrays, perhaps based on the type of the parameter(s).

I don't much like either of these though. If you ever want to use a gradient based inference algorithm you'll probably be best writing the model in terms of tensors, so I suspect the system should encourage that.

We could macro-rewrite all list indexing to a call like T.get and make sure that the get function also works for arrays, but that would incur some overhead for plain array accesses

Once we have ES2015 support everywhere we could implement this using a Proxy. That wouldn't incur overhead for arrays.

@stuhlmueller
Copy link
Member

Do you think there'll be much call for indexing tensors drawn from Dirichlets? Are the things you have in mind expressible using tensor operations, T.add, T.sumreduce, etc. If not, are there functions we could add that would make it possible?

I worry that people won't know about the distinction between vectors (tensors) and arrays, and that they will be surprised when they write functions that operate on arrays and that then break when applied to a value sampled from Dirichlet, say. Indexing was just the first instance of a difference in interface between arrays and tensors that came to mind, but there are others as well (e.g., slice vs range).

We could either attempt to make the interface similar enough that most array functions can polymorphically apply to tensors as well, or we could try to make it very obvious which things are arrays and which are tensors, and give helpful error messages when users attempt to use one in place of the other.

The current situation where integer property lookups on tensors result in undefined seems dangerous:

var xs = Vector([0, 1, 2]);
xs[1];  // => undefined
xs[1] > 0.5;  // => false

Once we have ES2015 support everywhere we could implement this using a Proxy.

Good idea!

Overall, I'm done with a shallow review of this PR. If we have a plan for the array/tensor issue above, I'm happy to move on and merge. (Given the amount of code that has been changed/added, a proper review would take at least another week or so, with some back-and-forth.)

@ngoodman
Copy link
Contributor

If we have a plan for the array/tensor issue above,

it's true that this introduces some potential confusion and slightly uglier code, but i think we need proper tensors for a lot of algorithm development. in general, anything that was an array of numbers should probably now be a tensor, hence all the distribution changes.

where it is feasible to add helpful error messages, we should; some places it probably won't be feasible (eg vec[0]) and we'll have to satisfy ourselves with warnings in the docs.

I'm happy to move on and merge.

i think we should go ahead and merge, so that we can get feedback sooner than later. let's warn the "public" and open an issue to think about more uniform array / tensor interfaces and proxies and such later on.

@mhtess you should check that these changes don't destroy the models for your class (some updates may be needed) and let us know if additional error checks or helper functions seem warranted....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants