Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks with BundleAdjustmentModels.jl #134

Open
amontoison opened this issue Jun 22, 2024 · 19 comments
Open

Benchmarks with BundleAdjustmentModels.jl #134

amontoison opened this issue Jun 22, 2024 · 19 comments

Comments

@amontoison
Copy link

amontoison commented Jun 22, 2024

@adrhill @gdalle
I tried to run some benchmarks on the NLS problems using BundleAdjustmentModels.jl, but I'm encountering issues with the sparsity detection of any Hessian. It takes an extremely long time. This could be a valuable set of tests for you because the problems are quite large. For the largest problems, my computer also crashes during the sparsity pattern detection of some Jacobian.

Note that I replaced norm(r) with sqrt(dot(r, r)) in the code (see this PR) to use version 5.x of SparseConnectivityTracer.jl. Guillaume, the Jacobian/Hessian of these problems could also be useful for SparseMatrixColorings.jl.

using BundleAdjustmentModels
using ADNLPModels
using NLPModels
using Printf

debug = false
check_jacobian = false
df = problems_df()
# df = df[ ( df.nequ .≤ 600000 ) .& ( df.nvar .≤ 100000 ), :]
nproblems = size(df)[1]

@printf("| %21s | %7s | %7s | %22s | %15s |\n", "problem", "nequ", "nvar", "backend", "coord_residual")
for i = 1:nproblems
  name_pb = df[i, :name]

  # Smaller problem of the collection
  debug && (i == 1) && (name_pb = "problem-49-7776-pre")
  debug && (i  1) && continue

  ## BundleAdjustementModels
  nls = BundleAdjustmentModel(name_pb)
  meta_nls = nls_meta(nls)
  Fx = similar(nls.meta.x0, meta_nls.nequ)
  rows = Vector{Int}(undef, meta_nls.nnzj)
  cols = Vector{Int}(undef, meta_nls.nnzj)
  vals = similar(nls.meta.x0, meta_nls.nnzj)
  x = rand(nls.meta.nvar)  # nls.meta.x0

  # warm-start
  residual(nls, nls.meta.x0)
  residual!(nls, nls.meta.x0, Fx)
  jac_structure_residual!(nls, rows, cols)
  jac_coord_residual!(nls, x, vals)

  # benchmarks
  start_timer = time()
  jac_coord_residual!(nls, x, vals)
  end_timer = time()
  timer_coord_residual = end_timer - start_timer
  @printf("| %21s | %7s | %7s | %22s | %6.5f seconds |\n", name_pb, nls.nls_meta.nequ, nls.meta.nvar, "BundleAdjustmentModels", timer_coord_residual)

  ## ADNLPModels
  function F!(Fx, x)
    residual!(nls, x, Fx)
  end
  nls2 = ADNLSModel!(F!, nls.meta.x0, meta_nls.nequ, nls.meta.lvar, nls.meta.uvar, jacobian_residual_backend = ADNLPModels.SparseADJacobian,
                                                                                   jacobian_backend = ADNLPModels.EmptyADbackend,
                                                                                   hessian_backend = ADNLPModels.EmptyADbackend,
                                                                                   hessian_residual_backend = ADNLPModels.EmptyADbackend)
  meta_nls2 = nls_meta(nls2)
  rows2 = Vector{Int}(undef, meta_nls2.nnzj)
  cols2 = Vector{Int}(undef, meta_nls2.nnzj)
  vals2 = similar(nls2.meta.x0, meta_nls2.nnzj)

  # Warm-start
  jac_structure_residual!(nls2, rows2, cols2)
  jac_coord_residual!(nls2, x, vals2)

  # benchmarks
  start_timer = time()
  jac_coord_residual!(nls2, x, vals2)
  end_timer = time()
  timer2_coord_residual = end_timer - start_timer
  @printf("| %21s | %7s | %7s | %22s | %6.5f seconds |\n", name_pb, nls2.nls_meta.nequ, nls2.meta.nvar, "ADNLPModels", timer2_coord_residual)

  if check_jacobian
    println(meta_nls.nnzj)
    println(meta_nls2.nnzj)
    J_nls = sparse(rows, cols, vals)
    J_nls2 = sparse(rows2, cols2, vals2)
    norm(J_nls - J_nls2) |> println
    norm(J_nls - J_nls2, Inf) |> println
  end
end

Note that you need hessian_residual_backend = ADNLPModels.SparseADHessian to test the Hessian.

@adrhill
Copy link
Owner

adrhill commented Jun 22, 2024

I'll investigate your benchmarks as soon as possible, but I just want to quickly comment on the following:

For the largest problems, my computer crashes during the sparsity pattern detection of the Jacobian.

We've encountered some compile time problems in SCT v0.5.X which we've since fixed on the main branch (more specifically in #120 (comment) and in #119). We're probably going to tag v0.6.0 as soon as we merge #131.

@adrhill
Copy link
Owner

adrhill commented Jun 22, 2024

This could be a valuable set of tests for you because the problems are quite large.

Absolutely! We're actively looking for more large-scale problems to test and benchmark on!

@amontoison
Copy link
Author

amontoison commented Jun 22, 2024

I will open a different discussion / issue but it should be also quite easy to benchmark the OPF problems of rosetta-opf
lanl-ansi/rosetta-opf#65
The problems are quite large too and you can also verify the sparsity pattern with JuMP.

@gdalle
Copy link
Collaborator

gdalle commented Jun 22, 2024

We've encountered some compile time problems in SCT v0.5.X which we've since fixed on the main branch

Note that this was mainly due to the britgas problem formulation being a literal compiler nightmare, with 4 thousand lines of unrolled for loops. I think equally large problems with a more realistic formulation (using actual for loops or vectorized code) would be much less of an issue. That's why I started JuliaSmoothOptimizers/OptimizationProblems.jl#332 but I shudder at the idea of doing this for every single problem.

@adrhill
Copy link
Owner

adrhill commented Sep 3, 2024

Does this issue still persist? I tried to run the code above but got a MethodError on line 10:

julia> nproblems = size(df)[1]
ERROR: MethodError: no method matching size(::JLD2.ReconstructedMutable{:DataFrame, (:columns, :colindex), Tuple{Any, DataFrames.Index}})

@adrhill
Copy link
Owner

adrhill commented Sep 6, 2024

Bump @amontoison

@amontoison
Copy link
Author

amontoison commented Sep 6, 2024

@adrhill I will have a look asap.

Off-topic: I was talking with Michel Schenen this morning about this package and he has two comments:

Btw, https://adrianhill.de/SparseConnectivityTracer.jl/stable/#Jacobian
For large matrices this "the Jacobian can be
obtained by computing a single forward-pass" might not be true.
Because of the memory requirement you might have to do it in batches.

There's also interesting work by Paul and Aupy: https://link.springer.com/article/10.1007/s101070100281

Paul always wanted to do a Julia implementation of Bayesian probing...

It would be awesome to have.
Forward them that paper.
I'm curious whether they are aware of it.
For really large Js this would be awesome as you get the sparsity in log(n).

@adrhill
Copy link
Owner

adrhill commented Sep 6, 2024

Because of the memory requirement you might have to do it in batches.

This is actually a really fun aspect of this package: SCT has very small memory requirements since we only need to keep track of index sets of non-zero values. Our default type is BitSet, which only requires a single bit per index. Our How it works guide will tell you more!

Thanks for the reference by the way, that looks intriguing!
EDIT: did the authors get mixed up? The linked paper is by Griewank and Mitev.

@adrhill
Copy link
Owner

adrhill commented Sep 6, 2024

And if you (against all odds) manage to run into a memory limit, we have an undocumented shared memory mode in which all tracers share a single set. ;)

@michel2323
Copy link

michel2323 commented Sep 6, 2024

Thanks for the reference by the way, that looks intriguing! EDIT: did the authors get mixed up? The linked paper is by Griewank and Mitev.

Yes, the authors did get mixed up. I see, you use index sets. That requires allocations (speed), right? It would be nice to have a method for using the underlying AD tools without index sets. For large problems like PDEs, I think index sets are prohibitive. Also, index sets won't work on GPUs. Why would one not just run on the CPU for the sparsity pattern? Well, in some cases, one might use a totally different algorithm on the GPU. The datastructures might be quite different.

Edit: But I see how this is totally fine for JuMP problems.

@adrhill
Copy link
Owner

adrhill commented Sep 6, 2024

It would be nice to have a method for using the underlying AD tools without index sets.

This tooling indeed already exists. Our tracer types are just thin wrappers around what we call AbstractPatterns. These patterns can be any representation of a sparse binary vector (or matrix) you want. The default is the aforementioned IndexSetGradientPattern. Do you have a data type in mind you would like to see implemented?

As for GPUs, we have some ideas but decided to get a publication for this package out of the door first (#67). I don't see why GPU support would be impossible if we introduce a suitable statically sized pattern type.

@gdalle
Copy link
Collaborator

gdalle commented Sep 6, 2024

I think this random probing might be a feature of DifferentiationInterface.jl instead. I already have a DenseSparsityDetector, I could add a RandomSparsityDetector too which would be backend-agnostic.

To clarify, SparseConnectivityTracer.jl does not run actual AD, it uses a different brand of abstract program interpretation which only tracks linear and nonlinear dependencies in scalar quantities. It seems extremely stupid and very allocation-intensive, but it's already much faster than what we had before so why not 🤷

@michel2323
Copy link

As for GPUs, we have some ideas but decided to get a publication for this package out of the door first (#67). I don't see why GPU support would be impossible if we introduce a suitable statically sized pattern type.

Good idea!

@adrhill
Copy link
Owner

adrhill commented Sep 6, 2024

To clarify, SparseConnectivityTracer.jl does not run actual AD

SCT can be thought of as a standalone binary forward-mode AD backend.

it uses a different brand of abstract program interpretation which only tracks linear and nonlinear dependencies in scalar quantities. It seems extremely stupid and very allocation-intensive

SCT is essentially a binary version of ForwardDiff.jl. It is no more “stupid” than ForwardDiff and certainly less allocation-intensive.

@gdalle
Copy link
Collaborator

gdalle commented Sep 6, 2024

It's much more allocation-intensive. Scalars in ForwardDiff are Dual numbers and they are stack-allocated. Ours are sets and they are heap-allocated.

@adrhill
Copy link
Owner

adrhill commented Sep 6, 2024

What we could (should?) add to this package is a mechanism for "chunked" index sets, similar to ForwardDiff. This could greatly reduce the memory usage of SCT for large problems.

@gdalle
Copy link
Collaborator

gdalle commented Sep 6, 2024

Essentially we need a set type that is a statically-sized version of BitSet, so that it can be stored in a fixed number of 64-bit integers and stack-allocated.

And then we can either use one of those if it can fit enough inputs, or run the function repeatedly while grouping the inputs in chunks of a multiple of 64. Now I get it

@adrhill
Copy link
Owner

adrhill commented Sep 6, 2024

Who would have thought a simple issue bump could turn out to be so inspiring! :)
I think a chunked mode with statically sized patterns might also be the missing puzzle piece for GPU compatibility. I've opened #194 to track our new ideas and continue our discussion.

Let's return to the BundleAdjustmentModels benchmarks here.

@amontoison
Copy link
Author

Does this issue still persist? I tried to run the code above but got a MethodError on line 10:

julia> nproblems = size(df)[1]
ERROR: MethodError: no method matching size(::JLD2.ReconstructedMutable{:DataFrame, (:columns, :colindex), Tuple{Any, DataFrames.Index}})

@adrhill You need an older version of DataFrames.jl:
https://github.com/JuliaSmoothOptimizers/BundleAdjustmentModels.jl/blob/main/Project.toml#L16

I should upgrade it in BundleAdjustementModels.jl one day...
It should solve your issue 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants