-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HMC #265
HMC #265
Conversation
The current test is hard to understand. It's also likely to generate false positives as there are several unexpected values which e1 and e2 can take where the model returns true. (e.g. "isNaN({} + {})".) This would not cause the test to fail as returning true with probability 1 is within the histogram tolerance of most inference tests. This new deterministic cache test would have failed prior to the recent update which fixed serialized of ±Infinity. This reverts the change made to the stochastic test made in 875812b.
Functions which operate on reals need to be transformed for AD.
This is preferable as it avoids adding a non-WebPPL function as a global variable and for consistency.
This looks good overall -- and like it was a lot of work! Thanks! Are there any examples where the current implementation of HMC does visibly better than plain MH? Some model with multiple continuous variables that are strongly correlated under the posterior so that all need to be updated jointly? It would be good to include at least one such example in the tests. In general, the main thing we should convince ourselves of is that the algorithm is correctly implemented. I'd add tests that include various sorts of dependencies between discrete and continuous erps and factors. For example:
Other comments:
|
See #265 for discussion.
This is less brittle than the previous approach of maintaining a list in adscorers.js.
Prior to this commit, when using sequenceKernel, only the final kernel in the sequence was taken into account when computing the acceptance ratio. Query handling had to be modified for a similar reason. i.e. It needs to happen after each kernel in the sequence, not once at the end of the sequence.
You're right. I'd still like to include something about this, would you be in favor of including a section on this under workflow? Something like this.
Good point. I've changed this to transform functions with names ending in
I've started talking to Sid about this.
It's from Radford Neal's handbook chapter. Section 5.1, p36. I don't think the current implementation is very general though. It only checks the constraint which comes from the support of the variable which is been updated which I think can come apart when constraints depend on other variables. It's conceivable that simply rejecting proposals that wind up in areas of zero probability is correct, but I don't know that it is. In #81 the suggestion was to transform constrained variables into unbounded spaces which sounds reasonable. Assuming we tackle this as a separate piece of work, the immediate decision is whether to stick with what we have, take it out, or improve it. Of those, the first two probably make most sense.
This is now
What do you have in mind? I happy to do it, I'm just not sure what jumps out to you about this compared to any other part of the algorithm. Does
This is indeed a little awkward. I've added a comment to explain.
The idea is that it's a convenient way to give I did considered including it on
It could go in
Done. Also added a comment to
Indeed, this works for the reason you mention. The reason I didn't do the obvious thing and I'm sure this could be cleaner, but does accessing it via
Not sure. Won't we need to change lots of things for VI? Can we worry about it when we need it?
Sure. I've made a start on this here, see what you think. I'll consider this a separate piece of work. i.e. #131.
I've thought about this and if people already use Everything else you mention under "Other comments" has been done. |
I think I've now added most of the tests you suggested. I hope the following makes sense:
e.g. DC = *Note -- in bivariateGaussianFactor the The "Multiple factors depending on a single ERP" case might be covered by gaussianMean? As far as tests go, I think that just leaves your "More complicated programs" suggestion. |
Good point!
This looks like a much better idea. I didn't realize all I wonder if Do you have thoughts on how to get rid of the overhead from AD-ified scorers? Maybe the ERP transform could keep around the original scorers, and dispatch calls to score functions of the right type (AD-ified or not) based on whether the current coroutine has an |
Definitely.
I had a half-baked idea that we might be able to have an |
Just to clarify, are you suggesting the function score(params, x) {
return env.coroutine.adRequired ? originalScore(params,x) : adScore(params, x);
} (I think that would work.) |
Yes, that's what I was thinking. |
Great, thanks. I have a version working here. I've not yet checked how much of the performance drop this recovers. I'll do that next. |
The following shows the fractional slow down incurred by some of the models in
(The reason ldaCollapsed is so much slower is that the For the first 3 models listed it looks like this change helps a bit, although things weren't that much slower to begin with. Seeing these results, it's not 100% clear to me that it's worth the extra effort of keeping both scorers around. My main concern is the fact that the code to transform the scorers is now pretty complicated and rather tightly coupled to the current structure of If we do think it's worth keeping both scorers around, perhaps I could revisit Sid/Daniel's original approach of splitting the scorers out into a separate file. (With the hope that transforming that file will be simpler, and perhaps delegation to the correct scorer can be added in the It's also a drag having to thread Any thoughts? |
Thanks for running these experiments! I agree that the additional complexity together with the relatively small improvement in speed doesn't make it worth keeping both scorers around, but it's good that we checked. Suppose we take the performance hit and decide to merge the PR essentially as is. Then what else is left to do? Based on the original comment, it looks like we want to switch to the latest ad.js for tensor support. What needs to happen there? Let's aim to merge the PR this week, and let's create issues for any (non-critical) remaining issues. |
Updating to a newer version of ad.js is the only thing I have in mind to do. The main reason for doing so would be to gain the ability to take derivatives w.r.t. tensor valued variables. (This hack avoids creating tapes for Dirichlet/mv Gaussian at present.) I'm not sure how far away ad.js is from been ready for this, I'd need to check. Elsewhere I've played with integrating Daniel's implementation of ad, so that might be an option if getting this working soon is important. It's not a trivial change though, since (as far as I can tell) it also requires reimplementing some basic matrix/vector operations we currently get from numeric.js. (Some of which I've done, though it would be a stretch to get it finished this week.) There was some talk of having both these ad libraries share a common interface, an idea which might be relevant here. On a different note, can I just check, are you OK with the change (described here) to |
OK, keep me updated on the If it looks like this is going to take a while, would it make sense to keep the PR as is, but detect when HMC is used with multivariate distributions and throw an error that says that it's not supported yet?
Fine with me. |
The advantage of this approach is that the friendly error will now be seen when trying running tests when erp.js is missing.
I do think this will take a little while (at least a week, probably longer) so if you're happy to do so I think we should merge this. I've made a few final tweaks (including throwing an error if multivariate ERP are used with HMC as you suggested) so I think it's ready. If we do merge it, I'll open a new issue for updating ad.js and also include the following on it:
|
Sounds good. I'm going to merge this now—thanks for making this happen! :-) |
This is a cleaned up version of my gradients branch. There are a few loose ends to tie up (see below) but I don't expect them to change things much, so now might be a good time to start reviewing my changes.
This PR includes:
MCMC
interface which allows per-kernel options to be specifiedMCMC
andSMC
Using HMC with the new interface looks like this:
Before use, the ERP score code needs to be transformed, this is now done like so:
This should happen automatically after an
npm install
. Iferp.js
is missing whenwebppl
is run a descriptive error message is shown.Still to do
Switch to the latest ad.js for tensor support. (See Merge variational inference #259.) Test Dirichlet. (See Should Dirichlet scorer check that values sum to 1? #275.) Switch to version ofPostponed, will tackle as a separate piece of work.isTape
exported byad.js
(inaggregation.js
) once available.TODO
notes. I've intentionally left these in for now as they might be useful while the PR is been reviewed.HMC
function or some other convenient way of sayingkernel: { sequence: { kernels: ['HMC', { MH: discreteOnly: true }] } }
. Thoughts?MAP
andentropy
methods onERP
. (Not needed if we drop constraint stuff.)Maybe: Adderp.js
to repo.Speed
I ran a few quick benchmarks and in the model I tested the time spent doing inference increased by ~20% using
MH
and by ~40% usingSMC
after these changes. I've not spent any time trying to optimize that away. The inference tests are probably quite a bit more than 40% slower because of the increase in the time taken to compile programs.Future work
As discussed in #81, there are several things not included here that we might like in the future:
factor
statements?)