Variational inference #494

null-a · 2016-06-13T14:28:25Z

I think we're good to merge the daipp branch. (Closes #27.) Here's a summary of what's included in this PR:

Optimize coroutine for optimizing guide params using either elbo or eubo.
Guides specified with sample(targetDist, {guide: guideDist}).
Mean-field guides as default.
Helpers for creating/registering parameters: scalarParam, tensorParam. Also still have param which dispatches to the scalar or tensor version based on its first arg.
Guides now used to specify importance distributions for SMC. withImportanceDist has been removed. (Partially addressing Unify treatment of importance distributions and custom MH proposers #119.)
ForwardSample coroutine to sample from guide or target program. (Closes Have an "inference" algorithm that just returns forward samples #368.)
Infer({method: 'optimize'}, model) as a convenient way of doing the common thing of doing some optimization and then sampling to make a marginal.
New saveTraces option on SMC as a way to generate traces for eubo estimator.
New distributions: TensorGaussian, DiagCovGaussian, LogisticNormal, MultivariateBernoulli.
Existing distributions updated to take tensors: Dirichlet, MultivariateGaussian, Discrete. (Discrete can still also take arrays, the others take only tensors.)
Helpers for working with tensors.
~~Work-in-progress mapData construct.~~
Small change to Delta. The only place where Delta is used (that I know of) is agent models, and I don't think this change will have any impact there.

* Adds wppl method getRelativeAddress. * Rejuvenation (MH only) can now be used when generating initial traces for tutorial training.

This code supported pretty printing optimization method instances.

The core primitive for registering parameters is now `registerParams`. This is more flexible than its predecessor in that it tracks arrays of parameters per name/address, and can optionally call a hook once parameters have been lifted.

I've double checked the SMC importance distribution code, it looks correct.

This reverts commit 3538227.

Introduced in 151e407.

ngoodman · 2016-06-13T15:05:41Z

this is exciting!

are there any breaking changes in this PR, or is it just additions? (seems like maybe signatures of some distributions have changed in a way that will break existing code?)

ngoodman · 2016-06-13T15:10:55Z

a suggestion: can we change the name of the naive VI with mean field optimize then sample call to Infer({method: 'variational'}, model)?

null-a · 2016-06-13T16:17:23Z

are there any breaking changes in this PR, or is it just additions?

The only breaking changes (I hope) are to those distributions that now take a tensor rather than an array or nested array. i.e. Dirichlet and MultivariateGaussian.

Attempting to create either of those using arrays throws a useful error message.

a suggestion: can we change the name of the naive VI with mean field optimize then sample call to Infer({method: 'variational'}, model)?

As implemented, Infer({method: 'optimize'}, model) doesn't necessarily optimize the elbo which is why I didn't call it variational. For example something like Infer({method: 'optimize', estimator: {EUBO: traces: traces}}, model) would work.

Also, just to be clear, this only does mean-field when no guide is specified.

We can of course add the thing you suggested, but we'll have to do some extra work to massage the parameters into the format expected by Optimize.

This massaging makes the interfaces a little less consistent (which then makes it harder to use, more fiddly to document) which is another reason I went the way I did:

Infer({method: 'optimize', samples: 1000, estimator: {ELBO: {samples: 10}}}, model)

// If you want to start doing something fancy, you might start by
// writing the above as something like:
Optimize(model, {estimator: {ELBO: {samples: 10}}})
SampleGuide(model, {samples: 1000})

// (The fact that Optimize and Infer both take a method parameter 
// spoils this a little. That could easily be fixed.)

versus:

Infer({method: 'variational', samples: 1000, elboSamples: 10}, model)
// this is a little trickier =>
Optimize(model, {estimator: {ELBO: {samples: 10}}})
SampleGuide(model, {samples: 1000})

So that's what I had in mind, what do you think?

ngoodman · 2016-06-13T17:27:45Z

that's reasonable... maybe make an additional method variational that does ELBO and uses reasonable default params? we could wait on this to see what we want when we are using / teaching this.

null-a · 2016-06-14T18:39:12Z

maybe make an additional method variational that does ELBO and uses reasonable default params?

I'm totally on board with the idea of having a slightly more convenient interface for this, I'm just not at all sure what it should be. I like the default params idea but I wonder whether having the option to vary the number of samples per step might turn out to be something we often need, especially while we don't have baselines.

we could wait on this to see what we want when we are using / teaching this.

Given my uncertainty I'd be happy to see how things go, things are likely to be clearer once people start using it. If someone wants to make a call, that's also fine with me. Either way, I don't think it needs to hold this PR up.

stuhlmueller · 2016-06-14T23:30:03Z

Existing distributions updated to take tensors: Dirichlet, MultivariateGaussian, Discrete. (Discrete can still also take arrays, the others take only tensors.)

Why make this distinction? (As a user, I'd assume that either both Dirichlet and Discrete take arrays, or neither.)

stuhlmueller · 2016-06-14T23:41:55Z

src/dists.ad.js

+function eqDim0(v, w) {
+  // Useful for checking two vectors have the same length, or that the
+  // dimension of a vector and matrix match.
+  return v.dims[0] === w.dims[0];
 }


Move the functions above somewhere else (to tensor.js, or util.js)?

null-a · 2016-06-15T16:10:26Z

Why make this distinction? (As a user, I'd assume that either both Dirichlet and Discrete take arrays, or neither.)

First, I should say that it's not obvious to me what the optimal thing to do here is, and there may well be other good or better choices, but here's what I was thinking:

Having the Discrete take either an array or a tensor seems suited to its use as either a prior or as a likelihood respectively. (Roughly speaking.) In the first case arrays seem convenient and simple (there may be no other tensors in the model, arrays everywhere is nice) in the second we need to take a tensor when the prior is e.g. a Dirichlet.

Something similar doesn't seem to hold for the Dirichlet. The value sampled from the Dirichlet is a tensor, so specifying the parameters as a tensor doesn't seem like a big deal.

Existing distributions updated to take tensors: Dirichlet, MultivariateGaussian, Discrete. (Discrete can still also take arrays, the others take only tensors.)

Re-reading this I've noticed that it glosses over a potentially more important issue: the values sampled from Dirichlet and MultivariateGaussian etc. are now tensors rather than arrays. I'm sure you realize this, but I thought I'd point it out just in case I sent us off in the wrong direction with my emphasis on the type of the parameters.

stuhlmueller · 2016-06-15T20:28:40Z

Re-reading this I've noticed that it glosses over a potentially more important issue: the values sampled from Dirichlet and MultivariateGaussian etc. are now tensors rather than arrays.

That's a significant backwards-incompatibility—many models that use Dirichlet may need to be adapted. Let's send an announcement to webppl-dev once this PR makes it to npm.

It's unfortunate that this change makes code a bit less readable/intuitive, going from

var xs = dirichlet([1, 1]); 
var x = xs[0];

to

var xs = dirichlet([1, 1]); 
var x = T.get(xs, 0);

but I don't see a good way around this. (We could macro-rewrite all list indexing to a call like T.get and make sure that the get function also works for arrays, but that would incur some overhead for plain array accesses.)

stuhlmueller · 2016-06-15T20:45:00Z

src/headerUtils.js

+  var mapDataIndices = {};
+
+  // This is been developed as part of daipp. It's probably still
+  // buggy.


Would it make sense to include mapData as part of a separate PR?

I'm happy to go either way on this, so I've removed it.

null-a · 2016-06-16T13:57:01Z

That's a significant backwards-incompatibility

Indeed. What's more I suspect that the effects of introducing tensors that are not as deeply embedded in the language as arrays will be felt for a while.

Our current situation is perhaps similar to the early days of python+numpy, so I expect it will take some effort to get to something as nice as their current setup.

Having multidimensional arrays everywhere (like e.g. Julia) is an appealing long-term solution, but whether that's achievable...

It's unfortunate that this change makes code a bit less readable/intuitive

Do you think there'll be much call for indexing tensors drawn from Dirichlets? Are the things you have in mind expressible using tensor operations, T.add, T.sumreduce, etc. If not, are there functions we could add that would make it possible?

but I don't see a good way around this

You'll have thought of this I'm sure, but for the record other options include:

We could implement both array and tensor valued versions of each distribution.
We could implement multivariate distributions in terms of tensors, but automatically convert from tensors to arrays, perhaps based on the type of the parameter(s).

I don't much like either of these though. If you ever want to use a gradient based inference algorithm you'll probably be best writing the model in terms of tensors, so I suspect the system should encourage that.

We could macro-rewrite all list indexing to a call like T.get and make sure that the get function also works for arrays, but that would incur some overhead for plain array accesses

Once we have ES2015 support everywhere we could implement this using a Proxy. That wouldn't incur overhead for arrays.

stuhlmueller · 2016-06-17T06:36:00Z

Do you think there'll be much call for indexing tensors drawn from Dirichlets? Are the things you have in mind expressible using tensor operations, T.add, T.sumreduce, etc. If not, are there functions we could add that would make it possible?

I worry that people won't know about the distinction between vectors (tensors) and arrays, and that they will be surprised when they write functions that operate on arrays and that then break when applied to a value sampled from Dirichlet, say. Indexing was just the first instance of a difference in interface between arrays and tensors that came to mind, but there are others as well (e.g., slice vs range).

We could either attempt to make the interface similar enough that most array functions can polymorphically apply to tensors as well, or we could try to make it very obvious which things are arrays and which are tensors, and give helpful error messages when users attempt to use one in place of the other.

The current situation where integer property lookups on tensors result in undefined seems dangerous:

var xs = Vector([0, 1, 2]);
xs[1];  // => undefined
xs[1] > 0.5;  // => false

Once we have ES2015 support everywhere we could implement this using a Proxy.

Good idea!

Overall, I'm done with a shallow review of this PR. If we have a plan for the array/tensor issue above, I'm happy to move on and merge. (Given the amount of code that has been changed/added, a proper review would take at least another week or so, with some back-and-forth.)

ngoodman · 2016-06-17T15:12:08Z

If we have a plan for the array/tensor issue above,

it's true that this introduces some potential confusion and slightly uglier code, but i think we need proper tensors for a lot of algorithm development. in general, anything that was an array of numbers should probably now be a tensor, hence all the distribution changes.

where it is feasible to add helpful error messages, we should; some places it probably won't be feasible (eg vec[0]) and we'll have to satisfy ourselves with warnings in the docs.

I'm happy to move on and merge.

i think we should go ahead and merge, so that we can get feedback sooner than later. let's warn the "public" and open an issue to think about more uniform array / tensor interfaces and proxies and such later on.

@mhtess you should check that these changes don't destroy the models for your class (some updates may be needed) and let us know if additional error checks or helper functions seem warranted....

null-a and others added 30 commits March 21, 2016 19:14

Fix broken syntax.

c8cf275

add daipp package

55dfa66

clean up daipp library a little

8a320fb

Merge branch 'daipp' of https://github.com/probmods/webppl into daipp

9c42102

Re-work relative addressing.

bfb6e31

* Adds wppl method getRelativeAddress. * Rejuvenation (MH only) can now be used when generating initial traces for tutorial training.

Complain if example trace includes AD nodes.

6f697af

First pass at bringing DAIPP closer to executable state.

92baef1

Correctly untapify tensor params at sample stmts.

705d5e3

small edits in daip.js

4e6e286

merge

d873f38

clean up daip.js, add cache fn

edd7351

handle empty array in val2vec

cbca136

add (probably wrong) squishnet for get right range in erp params

33c7c94

maybe fix tensor stuff in squishnet

0601c34

make latentSize live in daipp js, provided as daipp.latentSize

0b213be

change vec2importanceERP to vec2dist, because it sounds nice

424ed1a

clean things up a bit

c9e1ee0

More work on getting DAIPP ready to go (also some comments/questions)

b74a705

Merge branch 'dev' into daipp

a924a87

Remove unused var.

543b7de

Remove unused code.

360e78a

This code supported pretty printing optimization method instances.

Add TODO note.

368066e

Re-work parameter registration.

a725688

The core primitive for registering parameters is now `registerParams`. This is more flexible than its predecessor in that it tracks arrays of parameters per name/address, and can optionally call a hook once parameters have been lifted.

added comment re address vectors

0af45e9

add sketch of minimal daipp example, started fixing bugs

ee822ba

make daipp package name lowercase

7c80c1a

Bind callback context.

27a9e9c

tensorAdaptor takes length as first arg not array.

58538ef

Fix typo in variable name.

b88166c

Fix indentation.

88141cf

null-a added 4 commits June 13, 2016 12:38

Bump up the number samples following CI failures.

3538227

I've double checked the SMC importance distribution code, it looks correct.

Revert "Bump up the number samples following CI failures."

316c775

This reverts commit 3538227.

Undo unintentional change made to inference test.

7575762

Introduced in 151e407.

Remove stale comment.

224b9ce

stuhlmueller reviewed Jun 14, 2016
View reviewed changes

null-a added 2 commits June 15, 2016 15:52

Move helpers to util module.

5f27be8

Remove unused imports.

5780a3d

null-a mentioned this pull request Jun 15, 2016

Unify treatment of importance distributions and custom MH proposers #119

Closed

stuhlmueller reviewed Jun 15, 2016
View reviewed changes

null-a added 4 commits June 16, 2016 11:18

Merge branch 'dev' into daipp

6b16f41

Don't rely on macros to transform Math.logGamma

af40024

Document the optMethod parameter.

88071db

Remove mapData.

5c73a71

Move special.js and statistics.js to math folder.

1ed8082

stuhlmueller mentioned this pull request Jun 17, 2016

Improve usability of arrays/vectors #501

Open

stuhlmueller merged commit a3865d5 into dev Jun 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variational inference #494

Variational inference #494

null-a commented Jun 13, 2016 •

edited

Loading

ngoodman commented Jun 13, 2016

ngoodman commented Jun 13, 2016

null-a commented Jun 13, 2016

ngoodman commented Jun 13, 2016

null-a commented Jun 14, 2016

stuhlmueller commented Jun 14, 2016

stuhlmueller Jun 14, 2016

null-a commented Jun 15, 2016

stuhlmueller commented Jun 15, 2016 •

edited

Loading

stuhlmueller Jun 15, 2016 •

edited

Loading

null-a Jun 16, 2016

null-a commented Jun 16, 2016

stuhlmueller commented Jun 17, 2016

ngoodman commented Jun 17, 2016

Variational inference #494

Variational inference #494

Conversation

null-a commented Jun 13, 2016 • edited Loading

ngoodman commented Jun 13, 2016

ngoodman commented Jun 13, 2016

null-a commented Jun 13, 2016

ngoodman commented Jun 13, 2016

null-a commented Jun 14, 2016

stuhlmueller commented Jun 14, 2016

stuhlmueller Jun 14, 2016

Choose a reason for hiding this comment

null-a commented Jun 15, 2016

stuhlmueller commented Jun 15, 2016 • edited Loading

stuhlmueller Jun 15, 2016 • edited Loading

Choose a reason for hiding this comment

null-a Jun 16, 2016

Choose a reason for hiding this comment

null-a commented Jun 16, 2016

stuhlmueller commented Jun 17, 2016

ngoodman commented Jun 17, 2016

null-a commented Jun 13, 2016 •

edited

Loading

stuhlmueller commented Jun 15, 2016 •

edited

Loading

stuhlmueller Jun 15, 2016 •

edited

Loading