Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing the benchmark with ADNLPModel 0.8.3 #17

Closed
frapac opened this issue Jul 19, 2024 · 11 comments
Closed

Testing the benchmark with ADNLPModel 0.8.3 #17

frapac opened this issue Jul 19, 2024 · 11 comments

Comments

@frapac
Copy link
Collaborator

frapac commented Jul 19, 2024

ADNLPModel 0.8.3 uses a different algorithm for the coloring, using the symmetry of the Hessian. It may improve the performance in OptimalControl.jl.

xref control-toolbox/CTDirect.jl#111 (comment)

@0Yassine0
Copy link
Collaborator

0Yassine0 commented Jul 19, 2024

@ocots @frapac .
I'm working on that, at least to see its impact on OptimalControl.
However, I had some difficulty using the updates of the packages including the new 0.9.4 version of OC, the package MadNLP doesn't allow CTDirect to update from v0.9.3 to latest version, which wasn't affected by OptimalControl only during compilation.

I'm impressed with the results so far. In testing the Goddard rocket problem, the total time decreased from ~870ms to ~80ms!!
I might have made some mistakes collecting data in this notebook (without even using MKL yet), but it shows some promising results.

@frapac
Copy link
Collaborator Author

frapac commented Jul 19, 2024

@0Yassine0 Don't hesitate to remove MadNLP in your environment, as we are not using it currently.

Can you test the new release of OptimalControl with ADNLPModels 0.8.3 on moonlander? The current solution time with OC is

Total seconds in IPOPT (w/o function evaluations)    =      1.389
Total seconds in NLP function evaluations            =    207.693

It would be nice if the new release improve significantly the time spent in NLP function evaluations for this instance.

@0Yassine0
Copy link
Collaborator

0Yassine0 commented Jul 19, 2024

Yes, exactly that's what I did. I just use it when comparing with Ipopt and KNITRO.
I pushed the new results with MKL and the new releases:

Total seconds in IPOPT (w/o function evaluations)    =      1.399
Total seconds in NLP function evaluations            =      2.223

The result speaks for itself.

@frapac
Copy link
Collaborator Author

frapac commented Jul 19, 2024

Indeed. Good job @0Yassine0 !
Closing this issue now.

@frapac frapac closed this as completed Jul 19, 2024
@ocots
Copy link
Member

ocots commented Jul 19, 2024

@jbcaillau @PierreMartinon @joseph-gergaud ¡ Increíble !

@0Yassine0
Copy link
Collaborator

@ibtissammim it may help you.

@amontoison
Copy link

What should be compared between v0.8.2 and v0.8.3 is the number of colors.
You can get it with adnlp.backend.hessian_backend.ncolors.
Less colors => less directional derivates and AD

@jbcaillau
Copy link
Member

😱 @0Yassine0 @frapac impressive! guys, I would like further comments on this (closed) issue. @amontoison ?

Yes, exactly that's what I did. I just use it when comparing with Ipopt and KNITRO. I pushed the new results with MKL and the new releases:

Total seconds in IPOPT (w/o function evaluations)    =      1.399
Total seconds in NLP function evaluations            =      2.223

The result speaks for itself.

@amontoison
Copy link

amontoison commented Jul 20, 2024

The additional comment I can give is how variables are connected to each other in the Lagrangian of your problem. It's problem-dependent, but your problems could have quite a common pattern in the objective/constraints depending on the subset of optimal control problems and your discretization scheme.

An example of how it impacts AD is:

f(x) = x[1] * sum(x[i] for i=1:n)
c(x) = x

The Hessian of the Lagrangian will have a dense row (the first one), and thus we can't compute any combination of the columns of the Hessian with a column coloring (overlap on the first row of a potential compressed column). Therefore, we need to perform n directional derivatives to obtain each H[1, i].

The difference now is that we exploit the symmetry; thus, the first row is also the first column of the Hessian, and we can only do one directional derivative to compute them all.
The concept is simple, but the implementation is harder. In spirit, you don't want to implement a variant of METIS in Julia to compute a fill-in reducing permutation when you develop AD packages, optimization solvers, or even linear solvers.
Implementing a symmetric coloring (star coloring) is analogous.
It's a quite different topic related to graphs. I'm glad that Guillaume worked on graph theory. 🙂

The second issue is, for a computed coefficient H[i, j], is it the j-component of the i-column (computed in the compressed column of a specific color) or the i-row of the j-column (computed as another compressed column)? You want to correctly build the sparse Hessian after you have computed all nonzeros with AD (decompression step), and that's not easy to implement efficiently.

@jbcaillau
Copy link
Member

thanks @amontoison Got it that symmetry reduces significantly the number of directional derivatives to compute. Question: the sparsity structure is always know in advance and fixed1; to what extent is it helpful (= computationally relevant) / better to pass this structure to AD / optimiser, vs. using graph coloring as indicated above?

Footnotes

  1. up to a few changes depending on whether the final is fixed or not, etc.

@amontoison
Copy link

amontoison commented Jul 20, 2024

@jbcaillau

If you have the sparsity structure, you can avoid the sparsity detection, which can be quite expensive even if we use SparseConnectivityTracer.jl instead of Symbolics.jl.

The idea is to provide J directly here and H here.

Note that you still need to do the coloring afterward. The coloring is used to determine the seeds (combinations of columns of the identity matrix) that we should use for the directional derivatives to get all nonzeros of the sparse Jacobian / Hessian.

@tmigot started a tutorial to explain how to do that last year:
JuliaSmoothOptimizers/ADNLPModels.jl#209

Quick summary of what we do in ADNLPModels.jl:

  • Determine the sparsity pattern of the Jacobian / Hessian of any Julia function (SparseConnectivityTracer.jl)
  • Compute a column coloring for Jacobians or symmetric coloring for Hessians to determine the seeds that we need for the directional derivatives and the number of colors (== number of directional derivatives). We rely on SparseMatrixColorings.jl for that.
  • Preallocate workspace to do the directional derivatives and perform forward / reverse passes to compute compressed columns (ForwardDiff.jl / ReverseDiff.jl)
  • Recover the Jacobian / Hessian in COO / CSC format from the compressed columns (decompression step).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants