HMC #265

null-a · 2015-11-09T17:48:39Z

This is a cleaned up version of my gradients branch. There are a few loose ends to tie up (see below) but I don't expect them to change things much, so now might be a good time to start reviewing my changes.

This PR includes:

ad.js integration
A revised MCMC interface which allows per-kernel options to be specified
A HMC kernel which can be used with MCMC and SMC

Using HMC with the new interface looks like this:

MCMC(model, { samples: 100, kernel: 'HMC' })
SMC(model, { particles: 100, rejuvSteps: 5, rejuvKernel: 'HMC' })
MCMC(model, { samples: 100, kernel: { sequence: { kernels: ['HMC', { MH: discreteOnly: true }] } } })

Before use, the ERP score code needs to be transformed, this is now done like so:

./scripts/transformERP

This should happen automatically after an npm install. If erp.js is missing when webppl is run a descriptive error message is shown.

Still to do

Switch to the latest ad.js for tensor support. (See Merge variational inference #259.) Test Dirichlet. (See Should Dirichlet scorer check that values sum to 1? #275.) Switch to version of isTape exported by ad.js (in aggregation.js) once available. Postponed, will tackle as a separate piece of work.
Avoid unnecessarily creating tapes.
Remove TODO notes. I've intentionally left these in for now as they might be useful while the PR is been reviewed.
Add more continuous and mixed continuous/discrete tests, we only have 1 or 2 right now. Suggestions for good tests welcome!
Add a HMC function or some other convenient way of saying kernel: { sequence: { kernels: ['HMC', { MH: discreteOnly: true }] } }. Thoughts?
Update inference docs.
Update MAP and entropy methods on ERP. (Not needed if we drop constraint stuff.)
~~Maybe: Add erp.js to repo.~~
Maybe: Reject proposal rather than throw error on structure change, if we think doing so is correct. Discussed in Implement HMC #81.

Speed

I ran a few quick benchmarks and in the model I tested the time spent doing inference increased by ~20% using MH and by ~40% using SMC after these changes. I've not spent any time trying to optimize that away. The inference tests are probably quite a bit more than 40% slower because of the increase in the time taken to compile programs.

Future work

As discussed in #81, there are several things not included here that we might like in the future:

Constraint handling by transformation. (Do we need to handle hard factor statements?)
Step size adaption.
Support for nesting models.

The current test is hard to understand. It's also likely to generate false positives as there are several unexpected values which e1 and e2 can take where the model returns true. (e.g. "isNaN({} + {})".) This would not cause the test to fail as returning true with probability 1 is within the histogram tolerance of most inference tests. This new deterministic cache test would have failed prior to the recent update which fixed serialized of ±Infinity. This reverts the change made to the stochastic test made in 875812b.

Functions which operate on reals need to be transformed for AD.

This is preferable as it avoids adding a non-WebPPL function as a global variable and for consistency.

stuhlmueller · 2015-11-12T20:43:29Z

This looks good overall -- and like it was a lot of work! Thanks!

Are there any examples where the current implementation of HMC does visibly better than plain MH? Some model with multiple continuous variables that are strongly correlated under the posterior so that all need to be updated jointly? It would be good to include at least one such example in the tests.

In general, the main thing we should convince ourselves of is that the algorithm is correctly implemented. I'd add tests that include various sorts of dependencies between discrete and continuous erps and factors. For example:

Direct dependencies:

var x = sample(continuousERP)
var y = sample(continuousERP, [x])

var x = sample(discreteERP)
var y = sample(continuousERP, [x])

var x = sample(continuousERP)
var y = sample(discreteERP, [x])

Indirect dependencies: Each of the three cases above, but with x transformed by a compound function that calls out to a primitive function:
```
var f = function(x) {
  return Math.log(x + x);
}
```
Single factor depending on multiple ERPs: Each of the six cases above, but including a factor with a score that depends on a function of x and y (probably need to write a different function for each combination of discrete/continuous).
Multiple factors depending on a single ERP

More complicated programs:

var x = sample(continuousERP)
var y = sample(continuousERP, [f(x)])
var w = sample(discreteERP, [g(x, y)])
factor(h(x, y, w))
var z = sample(continuousERP, [i(x, y, w)])
factor(j(y, z, w))

Other comments:

The installation instructions in docs/development/install.rst now include a mention of ./scripts/transformERP, but you say it happens automatically with npm install -- are the extra install instructions unnecessary?
I'd rename scoreFunctions.js to something that makes it clear that it's about automatic differentiation (maybe adscorers.js). Also, to make it more like the other transforms, don't export the transform function directly, but rather as property of an object?
Having a list of helpers in scoreFunctions.js seems a bit fragile. Could we have a way to mark the corresponding functions directly in erp.ad.js?
As you point out, untapify in src/aggregation.js should probably be integrated into untapify in ad.js. If not, the condition should be factored out as isNonTapeObject or something like that to make it clearer what's going on.
Is there a reference for the constraint handling mechanism you implemented, or did you make it up? It looks sensible, but it would be good to have some assurance that it's doing the right thing. At least we'd want some tests that exercise it.
Add a comment to erp.ad.js explaining why sum needs to be defined there even though we already have one in util.js?
hmckernel.js: Add a reference to the top of the file that points to the main source for the algorithm? (The Radford Neal handbook chapter?)
hmckernel.js: Add a comment that explains what ad.yGradientR in line 153 does?
What's the role that thisPositionStepCont plays? Add a comment to clarify?
What's the motivation for the interface change to Initialize and kernels? In particular, I'm not convinced that kernels should take runWppl as an argument. Maybe it should be part of options, or a method on the trace object?
Related to this interface change (but not entirely explained by it): I found src/inference/kernels.js difficult to understand, in particular the top half. I think it would have helped a little to rename MH, HMC, and sequence to MHKernel, HMCKernel, and SequenceKernel respectively.
In various places (hmckernel.js, initialize.js, smc.js, mhkernel.js, trace.js), ad is accessed without being explicitly imported. I assume this works because, by the time these modules are imported, ad has already been made available globally? If so, I would feel better if it were imported explicitly. If you want reference to a single instance, do this via env?
mhkernel.js:65 - I'd avoid initializing multiple variables in one line, or at least limit it to the case where all are uninitialized. I was confused for a moment, thinking we were destructuring the return value of this.oldTrace.findChoice.
In smc.js, I wonder if it would be useful to keep the particle logWeight (~ estimate of normalization constant) tapified. Isn't the gradient of this quantity one of the ingredients we need for variational inference? (If not, how is it different?)
Now that the MCMC/SMC interface is getting more complex, we should start documenting it.
Regarding a convenient way to say kernel: { sequence: { kernels: ['HMC', { MH: discreteOnly: true }] } }: I think your proposed HMC function would be reasonable. We want to make it easier to use HMC+MH than pure HMC, so that users don't accidentally use the latter and then get confused about why the results for mixed continuous-discrete models aren't right.
Typos: hmckernel.js: structual->structural; mhkernel.js: indicies -> indices

See #265 for discussion.

This is less brittle than the previous approach of maintaining a list in adscorers.js.

See #265.

Prior to this commit, when using sequenceKernel, only the final kernel in the sequence was taken into account when computing the acceptance ratio. Query handling had to be modified for a similar reason. i.e. It needs to happen after each kernel in the sequence, not once at the end of the sequence.

null-a · 2015-11-20T15:23:24Z

are the extra install instructions unnecessary?

You're right. I'd still like to include something about this, would you be in favor of including a section on this under workflow? Something like this.

Having a list of helpers in scoreFunctions.js seems a bit fragile.

Good point. I've changed this to transform functions with names ending in AD. There are more elegant ways, but this is nice and simple. It also makes it easier see at a glance whether a score function only calls ADified helpers.

As you point out, untapify in src/aggregation.js should probably be integrated into untapify in ad.js.

I've started talking to Sid about this.

Is there a reference for the constraint handling mechanism you implemented

It's from Radford Neal's handbook chapter. Section 5.1, p36.

I don't think the current implementation is very general though. It only checks the constraint which comes from the support of the variable which is been updated which I think can come apart when constraints depend on other variables.

It's conceivable that simply rejecting proposals that wind up in areas of zero probability is correct, but I don't know that it is.

In #81 the suggestion was to transform constrained variables into unbounded spaces which sounds reasonable. Assuming we tackle this as a separate piece of work, the immediate decision is whether to stick with what we have, take it out, or improve it. Of those, the first two probably make most sense.

Add a comment to erp.ad.js explaining why sum needs to be defined there even though we already have one in util.js?

This is now sumAD. Is that enough of a hint, or do you still want a comment?

hmckernel.js: Add a comment that explains what ad.yGradientR in line 153 does?

What do you have in mind? I happy to do it, I'm just not sure what jumps out to you about this compared to any other part of the algorithm. Does // Compute gradient of score w.r.t. continuous variables cover it?

What's the role that thisPositionStepCont plays? Add a comment to clarify?

This is indeed a little awkward. I've added a comment to explain.

What's the motivation for the interface change to Initialize and kernels? In particular, I'm not convinced that kernels should take runWppl as an argument. Maybe it should be part of options, or a method on the trace object?

The idea is that it's a convenient way to give HMC access to wpplFn and the appropriate store and address needed to run the program from the beginning. Initialize already took these as three separate parameters, it now just takes a single runWppl argument instead. This has the nice side-effect of allowing us to clone the store (before running the program) in just one place.

I did considered including it on Trace but decided against it as:

You have to include it in the logic for copying/slicing traces
You have to pass it along to each new Trace() in HMC
It doesn't work for Initialize as it doesn't take a Trace

It could go in options, but the problem is that it doesn't play so well with the current approach of gradually building up the kernel by partially applying arguments to the main kernel function. It's not impossible of course (we can do it by wrapping the original function with a function that merges option objects appropriately) but I'm not sure it's worth it in this case. Another way you could go is to question the current approach of partially applying arguments -- I'm happy to talk about that if you like.

I think it would have helped a little to rename MH, HMC, and sequence to MHKernel, HMCKernel, and SequenceKernel respectively.

Done. Also added a comment to parseOptions which might help.

In various places (hmckernel.js, initialize.js, smc.js, mhkernel.js, trace.js), ad is accessed without being explicitly imported.

Indeed, this works for the reason you mention. The reason I didn't do the obvious thing and require('ad.js') in each file is that doing so requires specifying which mode ad.js should use each time require is called. I was happier calling methods on a global ad variable (which has to be there for ad.js anyway) than repeatedly having to say "reverse mode please" everywhere.

I'm sure this could be cleaner, but does accessing it via env.ad buy us much since we'll still have a top-level ad hanging around?

In smc.js, I wonder if it would be useful to keep the particle logWeight (~ estimate of normalization constant) tapified. Isn't the gradient of this quantity one of the ingredients we need for variational inference?

Not sure. Won't we need to change lots of things for VI? Can we worry about it when we need it?

Now that the MCMC/SMC interface is getting more complex, we should start documenting it.

Sure. I've made a start on this here, see what you think. I'll consider this a separate piece of work. i.e. #131.

I think your proposed HMC function would be reasonable.

I've thought about this and if people already use MCMC routinely then I don't think adding an HMC method is enough. Perhaps we should have the common case of HMC + MH be available using MCMC(model, { kernel: 'HMC' }) and come up with another name for the "HMC only" kernel. Alternatively we could have something like MCMC(model, { kernel: 'HMC+MH' }) be short hand for the thing you usually want.

Everything else you mention under "Other comments" has been done.

null-a · 2015-11-23T17:12:41Z

I think I've now added most of the tests you suggested. I hope the following makes sense:

Var Types	Direct Dependence	Indirect Dependence	Direct + Factor	Indirect + Factor
CC	bivariateGaussian	indirectDependency	bivariateGaussianFactor*	gaussianMeanVar
DC	mixed2	mixed4	mixed2Factor	mixed4Factor
CD	mixed1	mixed3	mixed1Factor	mixed3Factor

e.g. DC = var x = discrete(); var y = continuous(x);

*Note -- in bivariateGaussianFactor the factor depends only on x, not x and y.

The "Multiple factors depending on a single ERP" case might be covered by gaussianMean?

As far as tests go, I think that just leaves your "More complicated programs" suggestion.

stuhlmueller · 2015-12-15T10:54:30Z

Another reason it seems important to avoid always creating tapes is so that it remains possible to do inference with MH over models which pass numbers to external JS libraries.

Good point!

I have a working implementation of a more local method of deciding whether to create tapes here. The basic idea is to figure out whether tapes are needed by looking at which kernel we're using in the MCMC and SMC methods. What do you think of this approach?

This looks like a much better idea. I didn't realize all tapify calls are happening within the MCMC/SMC functions such that this works.

I wonder if this.ad should be this.adRequired everywhere. That would make it clearer that it's a Boolean flag.

Do you have thoughts on how to get rid of the overhead from AD-ified scorers? Maybe the ERP transform could keep around the original scorers, and dispatch calls to score functions of the right type (AD-ified or not) based on whether the current coroutine has an adRequired property that evaluates to true?

null-a · 2015-12-15T11:45:29Z

this.ad should be this.adRequired everywhere

Definitely.

Do you have thoughts on how to get rid of the overhead from AD-ified scorers?

I had a half-baked idea that we might be able to have an adRequired flag on Trace, and that we might be able to use that to delegate to the appropriate score function. Your way seems much better though, so I'll take a look at that.

null-a · 2015-12-18T10:49:32Z

Maybe the ERP transform could keep around the original scorers, and dispatch calls to score functions of the right type

Just to clarify, are you suggesting the score methods each become something like:

function score(params, x) {
    return env.coroutine.adRequired ? originalScore(params,x) : adScore(params, x);
}

(I think that would work.)

stuhlmueller · 2015-12-18T11:12:03Z

Yes, that's what I was thinking.

null-a · 2015-12-18T18:37:28Z

Yes, that's what I was thinking.

Great, thanks. I have a version working here. I've not yet checked how much of the performance drop this recovers. I'll do that next.

null-a · 2016-01-05T15:22:44Z

I've not yet checked how much of the performance drop this recovers. I'll do that next.

The following shows the fractional slow down incurred by some of the models in examples. "HMC" refers to this PR in its current state, "HMC both scorers" refers to HMC with the additional change we've discussed in the last few comments. (i.e. Maintaining two score functions for each ERP.)

Model	HMC	HMC both scorers
linearRegression	0.27	0.1
lda (20k)	0.18	0.09
logisticRegression	0.19	0.1
hmmIncremental (n=10, full enum)	0.01	0.02
ldaCollapsed (10k)	0.81	0.74

(The reason ldaCollapsed is so much slower is that the normalize helper it uses is now written in webppl rather than JS. So there'll be some extra CPS overhead, and perhaps it's also affected by #174. I'll ignore this here.)

For the first 3 models listed it looks like this change helps a bit, although things weren't that much slower to begin with. Seeing these results, it's not 100% clear to me that it's worth the extra effort of keeping both scorers around. My main concern is the fact that the code to transform the scorers is now pretty complicated and rather tightly coupled to the current structure of erp.ad.js, which could make future changes to erp.ad.js painful.

If we do think it's worth keeping both scorers around, perhaps I could revisit Sid/Daniel's original approach of splitting the scorers out into a separate file. (With the hope that transforming that file will be simpler, and perhaps delegation to the correct scorer can be added in the ERP constructor or something.)

It's also a drag having to thread env though all the chains of requires which end with require('erp'), though that's not such a big deal.

Any thoughts?

stuhlmueller · 2016-01-05T20:18:08Z

Thanks for running these experiments! I agree that the additional complexity together with the relatively small improvement in speed doesn't make it worth keeping both scorers around, but it's good that we checked.

Suppose we take the performance hit and decide to merge the PR essentially as is. Then what else is left to do? Based on the original comment, it looks like we want to switch to the latest ad.js for tensor support. What needs to happen there? Let's aim to merge the PR this week, and let's create issues for any (non-critical) remaining issues.

null-a · 2016-01-06T17:57:52Z

Updating to a newer version of ad.js is the only thing I have in mind to do. The main reason for doing so would be to gain the ability to take derivatives w.r.t. tensor valued variables. (This hack avoids creating tapes for Dirichlet/mv Gaussian at present.)

I'm not sure how far away ad.js is from been ready for this, I'd need to check. Elsewhere I've played with integrating Daniel's implementation of ad, so that might be an option if getting this working soon is important. It's not a trivial change though, since (as far as I can tell) it also requires reimplementing some basic matrix/vector operations we currently get from numeric.js. (Some of which I've done, though it would be a stretch to get it finished this week.) There was some talk of having both these ad libraries share a common interface, an idea which might be relevant here.

On a different note, can I just check, are you OK with the change (described here) to ERP.prototype.support? (It's used for constraint handling in HMC.)

stuhlmueller · 2016-01-07T17:16:51Z

OK, keep me updated on the ad.js upgrade status.

If it looks like this is going to take a while, would it make sense to keep the PR as is, but detect when HMC is used with multivariate distributions and throw an error that says that it's not supported yet?

On a different note, can I just check, are you OK with the change (described here) to ERP.prototype.support? (It's used for constraint handling in HMC.)

Fine with me.

The advantage of this approach is that the friendly error will now be seen when trying running tests when erp.js is missing.

null-a · 2016-01-11T14:26:39Z

I do think this will take a little while (at least a week, probably longer) so if you're happy to do so I think we should merge this. I've made a few final tweaks (including throwing an error if multivariate ERP are used with HMC as you suggested) so I think it's ready.

If we do merge it, I'll open a new issue for updating ad.js and also include the following on it:

Move the version of untapify in aggregation.js into the ad package.
Don't hardcode the path to the ad.js macros file.

stuhlmueller · 2016-01-11T19:30:58Z

Sounds good. I'm going to merge this now—thanks for making this happen! :-)

HMC

null-a added 7 commits November 9, 2015 10:07

Integrate ad.js.

19b368e

Allow per-kernel inference options.

d06d57b

Add HMC kernel.

2be68d6

Move normalizeArray to header.

7d28a72

Functions which operate on reals need to be transformed for AD.

Handle Initialize the same way as the kernels.

c96a18d

This is preferable as it avoids adding a non-WebPPL function as a global variable and for consistency.

Fix compatibility with Node < 4.

c8f7922

null-a added 15 commits November 13, 2015 15:53

ERP support should be a function.

3a45d4c

Extra HMC tests.

d7ddd05

Rename source file of ERP scorers transform.

15ab5c6

Export function as property of object.

51855dc

See #265 for discussion.

Remove commented out debug code.

5dbd259

Use naming convention to indicate AD transform is required.

cfdfbeb

This is less brittle than the previous approach of maintaining a list in adscorers.js.

Add citation.

6c326b0

Add comment on the role of positionStepCont.

26e654f

Attempt to make kernels.js easier to understand.

24861ab

See #265.

Tweak for readability.

d23a43f

See #265.

Typo.

27e202e

Fix spelling in variable name.

19103c6

Merge branch 'dev' into hmc

4552ec0

Remove unnecessary comment.

39f103a

null-a added 3 commits November 20, 2015 17:31

Extra HMC tests.

e0bbc72

Extra HMC tests.

9dafbe0

Lint.

a32464a

null-a added 2 commits November 24, 2015 08:38

Remove unused require statements.

6ede011

Include all mixed models in ForwardSample tests.

294eac4

null-a added 2 commits December 14, 2015 15:48

Remove TODO notes.

9120d13

Spelling.

76ac1d3

null-a added 2 commits December 16, 2015 09:43

Only create tapes when necessary.

317812d

Remove mistakenly committed temporary file.

eed6e34

null-a added 5 commits January 6, 2016 15:18

Merge branch 'dev' into hmc

8552ddd

Tweak HMC test tolerances.

53f009e

Update comment to reflect current ERP interface.

65fb8a4

Tweak HMC test tolerances.

9ba150b

Tweak HMC test tolerances.

6d4afbd

null-a added 6 commits January 11, 2016 09:17

Merge branch 'dev' into hmc

0ca3c63

Throw error if multivariate ERP are used with HMC.

2a5775e

Move check for presence of erp.js into header.js.

9938e96

The advantage of this approach is that the friendly error will now be seen when trying running tests when erp.js is missing.

Signal we don't use higher order derivatives.

f8dda62

Small clean up of untapify.

8522e68

Move comment to issue tracker.

3206ef2

stuhlmueller added a commit that referenced this pull request Jan 11, 2016

Merge pull request #265 from null-a/hmc

49bf7d0

HMC

stuhlmueller merged commit 49bf7d0 into probmods:dev Jan 11, 2016

null-a deleted the hmc branch January 19, 2016 10:08

null-a mentioned this pull request Feb 10, 2016

Support Enumerate inside HMC #293

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HMC #265

HMC #265

null-a commented Nov 9, 2015

stuhlmueller commented Nov 12, 2015

null-a commented Nov 20, 2015

null-a commented Nov 23, 2015

stuhlmueller commented Dec 15, 2015

null-a commented Dec 15, 2015

null-a commented Dec 18, 2015

stuhlmueller commented Dec 18, 2015

null-a commented Dec 18, 2015

null-a commented Jan 5, 2016

stuhlmueller commented Jan 5, 2016

null-a commented Jan 6, 2016

stuhlmueller commented Jan 7, 2016

null-a commented Jan 11, 2016

stuhlmueller commented Jan 11, 2016

HMC #265

HMC #265

Conversation

null-a commented Nov 9, 2015

stuhlmueller commented Nov 12, 2015

null-a commented Nov 20, 2015

null-a commented Nov 23, 2015

stuhlmueller commented Dec 15, 2015

null-a commented Dec 15, 2015

null-a commented Dec 18, 2015

stuhlmueller commented Dec 18, 2015

null-a commented Dec 18, 2015

null-a commented Jan 5, 2016

stuhlmueller commented Jan 5, 2016

null-a commented Jan 6, 2016

stuhlmueller commented Jan 7, 2016

null-a commented Jan 11, 2016

stuhlmueller commented Jan 11, 2016