Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

marg_MAP option to use LBFGS hessians #48

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

kimmywu
Copy link
Collaborator

@kimmywu kimmywu commented Jan 21, 2021

This PR adds the option for marg_MAP to use LBFGS Hessians updates at each step, as well as terminating the MAP estimate either at nstep or when ϕtol is reached.

@@ -311,6 +311,14 @@ function MAP_marg(
diffϕ=sum(unbatch(norm(LowPass(1000) * (sqrt(ds.Cϕ) \ (ϕ - lastϕ))) / sqrt(2length(ϕ))))
end

push!(history, select((;g,ϕ,lastHg=Map(lastHg),diffϕ), history_keys))
Copy link
Collaborator Author

@kimmywu kimmywu Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marius311: For some unknown reason, converting to Map using Map(lastHg) is needed for the output lastHg to be scaled correctly when hess_method="lbfgs-hessian." Otherwise, it is orders of magnitude off (and looks like it's coming from a scale factor) when passed to history. It has the correct amplitude when applied to ϕ in the code.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't really remember tbh but glancing at

ϕ, = @⌛ optimize(
objective,
Map(ϕ),
OptimKit.LBFGS(
lbfgs_rank;
maxiter = nsteps,
verbosity = verbosity[1],
linesearch = OptimKit.HagerZhangLineSearch(verbosity=verbosity[2], maxiter=5)
);
finalize!,
inner = (_,ξ1,ξ2)->sum(unbatch(dot(ξ1,ξ2))),
precondition = (_,η)->Map(Hϕ⁻¹*η),
)
looks like I also have some Maps. I think in theory you could get rid of that by defining more of the things OptimKit needs like retract, inner (that one you already are), scale!, add!, and transport! as mentioned in their readme, although my guess is performance-wise the extra Map don't really matter so its probably fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see that you pass in Map-converted field variables in MAP_joint for the optimization in places. I tested both and in my case, they yield the same results with or without passing a Map-converted field and figure to just go without to reduce the back-and-forth.

Agreed that it doesn't slow down the code. But it does mean it's storing a larger vector (Map vs Fourier), and it's not the same (Fourier)type as the rest of the return keys. So I want to see if you already know of similar peculiar behavior. Or if this is a corner case, that I run into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants