PDVec (#183)

* Move `fciqmc_step!` into Intefaces (?) * Rework initiator dvec * Move operator-vector function to DictVectors * Add PDVecs and friends * some fixes and reworking * Fix MPI communication * Fix mpi_recv_any signature * Add tests * Fix bugs with dot and initiators, add MPI tests * Remove annihilation tracking workaround * Benchmark PDVec instead * Some cleaning in Hamiltonains, make PDVec the default. * Fix error caused by moving things around * Doc updates * Move hacks to pdworkingmemory * temp * Fix up after rebase * Change which project mpi_runtests is run from * instantiate MPI deps before test * Another attempt to fix MPI CI * Attempt to fix MPI CI again * using -> import * Add debugging info to job * Fix benchmark, do MPI first * MPI tests without MPI * Fix project paths * Fix install of MPI deps * Fix deps * Fix Pkg stuff * Fix deprecations in RMPI * Restore ordering, fixing errors. * Fix three-argument dotting issues * Benchmark with threads=auto * Attempt to add coverage for MPI * Fix more depwarns in RMPI * Fix even more depwarns; fix benchmark * Fix depwarn * Fix MPI probing * Split testing into phases * Fix yml * Re-rework actions * Change MPI coverage tracking * Remove dead code * Removed too much dead code * Remove foreach calls that modify LHS. * Reordering, more docs * Fix the doctest * Fix tests * Fix bug in AllOverlaps * Change example to regular code listing * Minor fix in tests, bugfix * skip some tests * Improve coverage and fix bugs * Remove folds from testset * Add distributed dot tests, remove dead code, reenable threads in testing * fix yml * Fix randomness in dot tests * Docstring fixup * Alleviate type instability * Make example numerically stable * Refix type instability * Fix doctests * enable colors, don't test PDVec docstring * Update src/DictVectors/communicators.jl Co-authored-by: christofbradly <[email protected]> * Update src/strategies_and_params/replicastrategy.jl Co-authored-by: christofbradly <[email protected]> * Change default vector choice, include it in docs. * Rename functions * Fix up docs * Keep executor in similar, fix `length` for working memory * Use PDVec in MPI example * Add an extension for using Rimu with KrylovKit (#187) * Add an extension for using Rimu with KrylovKit * Add the actual extension * Loosen the version requirement * Add KrylovKit to [extras] * Fix broken dot * Use starting address that does not produce a vector that is orthogonal to the groundstate * Remove method that provides starting vector. * apply bitstring fix * `PDVec` with tuples (#188) * Replace vectors with tuples, remove executors * add fix * doc tweak * Remove the option of setting the number of segments * Fix bugs and tests * single-segment optimization * fix depwarn * remove rogue num_segments * improve test coverage * fix mpi test * fix some dots * Merge branch 'feature/pdvec' of github.com:joachimbrand/Rimu.jl into feature/pdvec * Fix equality for `DVec` * Fix constructor and tests * Make bangbang functions work properly * checkpoint * checkpoint * set version for release * Fixes after merge * Update Project.toml version * Fix KrylovKit extension * fix MPI tests * Fix mul with deterministic vectors * `add!` -> `dict_add!` * docstring for `dict_add!`, remove `propagators.jl` * Remove mentions of `OperatorMulPropagator` * attempt at fixing type instability * Introduce FrozenPDVec for faster projected energy * Small fixes * Fix diagonal operator spawning * fix some type instabilities * Rework benchmarks - trial run * Update Project.toml * Update tune file * re-update benchmarks * Remove type stability enforcing * Add see also to all DVec types * CI setup * Remove unused parameter * Fix compression bug * Fix the benchmark setup * Fix docs issues * Fix compression issues * Doc improvements * Fix bug in tests, docstring * Docstring fix * Remove unused dependency, clean up example script * Review comments * Add MPI docs, review comments * Apply suggestions from code review Co-authored-by: Joachim Brand <[email protected]> Co-authored-by: christofbradly <[email protected]> * Update MPI doc * add `PDVecKeys/Vals/Pairs` docstrings * filter(!), out-of-place map, fix potential bug in map! * Fix doc links * Fix new code * Another bugfix * One more * Fix bug in filter! * Fix wrong method call * Remove type from `f` in `filter` * Fix method errors * Update docs/src/mpi.md Co-authored-by: christofbradly <[email protected]> * bump version * bump version --------- Co-authored-by: christofbradly <[email protected]> Co-authored-by: Joachim Brand <[email protected]> Co-authored-by: Joachim Brand <[email protected]>
RimuQMC · Nov 10, 2023 · 2825da8 · 2825da8 · joachimbrand · Nov 11, 2023
1 parent b836c95
commit 2825da8
Show file tree

Hide file tree

Showing 32 changed files with 2,243 additions and 206 deletions.
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "Rimu"
 uuid = "c53c40cc-bd84-11e9-2cf4-a9fde2b9386e"
 authors = ["Joachim Brand <[email protected]>"]
-version = "0.9.1-dev"
+version = "0.10.0"
 
 [deps]
 Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
@@ -39,6 +39,12 @@ TerminalLoggers = "5d786b92-1e48-4d6f-9151-6b4477ca9bed"
 TupleTools = "9d95972d-f1c8-5527-a6e0-b4b365fa01f6"
 VectorInterface = "409d34a3-91d5-4945-b6ec-7529ddf182d8"
 
+[weakdeps]
+KrylovKit = "0b1a1467-8014-51b9-945f-bf0ae24f4b77"
+
+[extensions]
+KrylovKitExt = "KrylovKit"
+
 [compat]
 Arrow = "1.5, 2"
 BSON = "0.3"
@@ -50,6 +56,7 @@ Distributions = "0.25"
 FFTW = "1"
 Folds = "0.2"
 HypergeometricFunctions = "0.3"
+KrylovKit = "0.6"
 MPI = "0.20"
 MacroTools = "0.5"
 Measurements = "2"
@@ -72,6 +79,7 @@ VectorInterface = "0.2, 0.3, 0.4"
 julia = "1.7"
 
 [extras]
+KrylovKit = "0b1a1467-8014-51b9-945f-bf0ae24f4b77"
 Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
 
 [targets]

diff --git a/benchmark/benchmarks.jl b/benchmark/benchmarks.jl
@@ -9,16 +9,16 @@ const SUITE = @benchmarkset "Rimu" begin
                 M = 16
                 addr = FermiFS2C(M, 1 => 1, 2 => 1, 1 => -1, 2 => -1)
                 ham = HubbardRealSpace(addr; geometry=PeriodicBoundaries(4, 4))
-                dv = DVec(addr => 1.0)
-                eigsolve(ham, dv, 1, :SR; issymmetric=true, tol=1e-9)
+                dv = PDVec(addr => 1.0)
+                eigsolve(ham, dv, 1, :SR; tol=1e-9)
             end seconds=30
 
             @case "Bose-Hubbard in momentum space" begin
                 M = N = 10
                 addr = BoseFS(M, M ÷ 2 => N)
                 ham = HubbardMom1D(addr; u=6.0)
-                dv = DVec(addr => 1.0)
-                eigsolve(ham, dv, 1, :SR; issymmetric=true, tol=1e-9)
+                dv = PDVec(addr => 1.0)
+                eigsolve(ham, dv, 1, :SR; tol=1e-9)
             end seconds=40
         end
 
@@ -28,7 +28,7 @@ const SUITE = @benchmarkset "Rimu" begin
                 M = 20
                 addr = BoseFS(M, M÷2 => 10)
                 ham = HubbardMom1D(addr; u=6.0)
-                dv1 = DVec(addr => 1.0)
+                dv1 = PDVec(addr => 1.0)
                 dv2 = zerovector(dv1)
                 mul!(dv2, ham, dv1)
                 mul!(dv1, ham, dv2)
@@ -42,7 +42,7 @@ const SUITE = @benchmarkset "Rimu" begin
                 M = 30
                 addr = FermiFS2C(M, M÷2-1 => 1, M => 1, M÷2 => -1, M÷2+1 => -1)
                 ham = Transcorrelated1D(addr)
-                dv1 = DVec(addr => 1.0)
+                dv1 = PDVec(addr => 1.0)
                 dv2 = zerovector(dv1)
                 mul!(dv2, ham, dv1)
                 mul!(dv1, ham, dv2)
@@ -56,7 +56,7 @@ const SUITE = @benchmarkset "Rimu" begin
         @case "(10, 20) Mom space with projected energy and initiator" begin
             addr = BoseFS(20, 10 => 10)
             ham = HubbardMom1D(addr, u=1.0)
-            dv = InitiatorDVec(addr => 1.0; style=IsDynamicSemistochastic())
+            dv = PDVec(addr => 1.0; style=IsDynamicSemistochastic(), initiator=true)
             post_step = ProjectedEnergy(ham, dv)
             s_strat = DoubleLogUpdate(targetwalkers=40_000)
 
@@ -66,7 +66,7 @@ const SUITE = @benchmarkset "Rimu" begin
         @case "(4+1, 11) 2C Mom space with G2Correlators" begin
             addr = BoseFS2C(ntuple(i -> ifelse(i == 5, 4, 0), 11), ntuple(==(5), 11))
             ham = BoseHubbardMom1D2C(addr, v=0.1)
-            dv = DVec(addr => 1.0f0; style=IsDynamicSemistochastic{Float32}())
+            dv = PDVec(addr => 1.0f0; style=IsDynamicSemistochastic{Float32}())
             s_strat = DoubleLogUpdate(targetwalkers=10_000)
             replica = AllOverlaps(2, ntuple(i -> G2Correlator(i - 1), 7))
 
@@ -76,7 +76,7 @@ const SUITE = @benchmarkset "Rimu" begin
         @case "(50, 50) Real space" begin
             addr = near_uniform(BoseFS{50,50})
             ham = HubbardReal1D(addr, u=6.0)
-            dv = DVec(addr => 1.0; style=IsDynamicSemistochastic())
+            dv = PDVec(addr => 1.0; style=IsDynamicSemistochastic())
             s_strat = DoubleLogUpdate(targetwalkers=50_000)
 
             lomc!(ham, dv; s_strat, dτ=1e-4, laststep=1000)

diff --git a/docs/make.jl b/docs/make.jl
@@ -50,6 +50,7 @@ makedocs(;
         "Examples" => EXAMPLES_PAIRS[sortperm(EXAMPLES_NUMS)],
         "User documentation" => [
             "StatsTools" => "statstools.md",
+            "Using MPI" => "mpi.md",
         ],
         "Developer documentation" => [
             "Interfaces" => "interfaces.md",

diff --git a/docs/src/dictvectors.md b/docs/src/dictvectors.md
@@ -10,6 +10,7 @@ AbstractDVec
 ```@docs
 DVec
 InitiatorDVec
+PDVec
 ```
 
 ## Interface functions
@@ -52,6 +53,8 @@ NormProjector
 Norm2Projector
 UniformProjector
 Norm1ProjectorPPop
+Rimu.DictVectors.FrozenDVec
+Rimu.DictVectors.FrozenPDVec
 ```
 
 ## Initiator rules
@@ -70,6 +73,22 @@ Rimu.DictVectors.NonInitiator
 Rimu.DictVectors.NonInitiatorValue
 ```
 
+## `PDVec` internals
+
+### Working memory
+
+```@autodocs
+Modules = [DictVectors]
+Pages = ["pdworkingmemory.jl"]
+```
+
+### Communicators
+
+```@autodocs
+Modules = [DictVectors]
+Pages = ["communicators.jl"]
+```
+
 ## Index
 ```@index
 Pages   = ["dictvectors.md"]

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -25,7 +25,7 @@ The code supports parallelisation with MPI (harnessing [`MPI.jl`](https://github
 
 **Concept:** Joachim Brand and Elke Pahl.
 
-**Contributors:** Joachim Brand, Elke Pahl, Mingrui Yang, Matija Cufar, Chris Bradly.
+**Contributors:** Joachim Brand, Elke Pahl, Mingrui Yang, Matija Čufar, Chris Bradly.
 
 Discussions, help, and additional contributions are acknowledged by Ali Alavi,
 Didier Adrien, Chris Scott (NeSI), Alexander Pletzer (NeSI).
@@ -110,10 +110,11 @@ on the same hardware makes use of 4 cores and the main part completes in 1.04
 seconds, a speedup factor of 2.6. This seems reasonable, given that extra work
 needs to be done for communicating between different processes.
 
-Using MPI parallelism with `Rimu` is easy. Enabling MPI for use in [`lomc!()`](@ref) 
-is done by wrapping the primary data structures as [`MPIData`](@ref Main.Rimu.RMPI.MPIData). As a consequence, data will 
-be stored in a distributed fashion among the MPI ranks and only communicated between ranks when 
-necessary. The relevant functionality is provided by the module `Rimu.RMPI`.
+Using MPI parallelism with `Rimu` is easy. Enabling MPI for use in [`lomc!()`](@ref) is
+enabled automatically if [`PDVec`](@ref) is used to store a vector. In that case, data will
+be stored in a distributed fashion among the MPI ranks and only communicated between ranks
+when necessary. Additional MPI-related functionality is provided by the module [`RMPI`](@ref
+Rimu.RMPI).
 
 ## References
 The code implements the FCIQMC algorithm originally described in

diff --git a/docs/src/mpi.md b/docs/src/mpi.md
@@ -0,0 +1,92 @@
+# Working with MPI
+
+If you are using [`PDVec`](@ref Main.DictVectors.PDVec)s to store your vectors, working with
+MPI should be fairly straightforward. Generally, [`PDVec`](@ref Main.DictVectors.PDVec) will
+work with MPI automatically, as long as MPI is set up correctly and a few common pitfalls
+are avoided.
+
+Rimu includes an unexported module [`RMPI`](@ref Main.Rimu.RMPI), which must be imported to access
+additional MPI-related functionality.
+
+## Configuring MPI
+
+When running on a cluster, ensure that MPI.jl is using the system binary. See [the MPI.jl
+documentation](https://juliaparallel.org/MPI.jl/latest/configuration/) for more information.
+
+It is always a good idea to start your script with a quick test that ensures the MPI is set up correctly. One way to do this is to open with
+
+```julia
+using Rimu.RMPI
+mpi_allprintln("hello")
+```
+
+which should print something like
+
+```
+[ rank 0: hello
+[ rank 1: hello
+[ rank 2: hello
+[ rank 3: hello
+```
+
+If it prints `rank 0` several times, the code will run, but ranks will not communicate.
+
+## Using Slurm
+
+When using [`PDVec`](@ref Main.DictVectors.PDVec), the recommended setup is to use threads to parallelise the
+computation within a node, and to only use MPI for inter-node communication. In a slurm
+script, this is done as follows:
+
+```bash
+...
+#SBATCH --ntasks-per-node=1
+#SBATCH --nodes=4            # replace 4 with the desired number of nodes
+#SBATCH --cpus-per-task=28   # replace 28 with the number of cores available in a node
+#SBATCH --hint=nomultithread # don't use hyperthreading
+...
+
+srun julia --project -tauto script.jl
+```
+
+On some clusters, additional settings must be used with `srun`, for example the CTCP cluster
+requires the following.
+
+```bash
+srun mpi=pmi2 julia --project -tauto script.jl
+```
+
+## Common pitfalls with reducing functions
+
+### Using `@mpi_root`
+
+Take care to not use reducing functions (such as `length`, `sum`, `norm`, ...) inside
+[`@mpi_root`](@ref Main.Rimu.RMPI.@mpi_root) blocks. Doing so will only initiate the
+distributed reduction on one rank only, which will cause the code to go out of sync and
+freeze. As an example, to report the current length of a vector, calculate the length before
+the [`@mpi_root`](@ref Main.Rimu.RMPI.@mpi_root) block:
+
+```julia
+len = length(pdvec)
+@mpi_root println("vector length is $len")
+```
+
+## Threaded operations and reductions
+
+When using functions that take anonymous functions, such as `map(!)`, `sum`, or `mapreduce`, it is important that the anonymous functions passed to them do not perform any MPI-reducing operations (`length`, `norm`, `sum`, etc.). These anonymous functions are executed on multiple threads and initiating MPI communication from multiple threads may cause issues.
+
+As an example, suppose we want to scale a vector by its length by using `map!`. The correct way to write this code is as
+
+```julia
+len = length(pdvec)
+map!(values(pdvec)) do x
+	x / len
+end
+```
+
+Similar to the previous example, `len` is calculated first and not within the body of
+`map!`. In this specific case, an even better option is to use the `scale!` function from
+[VectorInterface.jl](https://github.com/Jutho/VectorInterface.jl):
+
+```julia
+scale!(pdvec, 1 / length(pdvec))
+```
diff --git a/ext/KrylovKitExt.jl b/ext/KrylovKitExt.jl
@@ -0,0 +1,49 @@
+module KrylovKitExt
+
+using Rimu
+using KrylovKit
+using LinearAlgebra
+
+const U = Union{Symbol,EigSorter}
+
+"""
+    OperatorMultiplier
+
+A struct that holds the working memory for repeatedly multiplying vectors with an operator.
+"""
+struct OperatorMultiplier{H,W<:PDWorkingMemory}
+    hamiltonian::H
+    working_memory::W
+end
+function OperatorMultiplier(hamiltonian, vector::PDVec)
+    return OperatorMultiplier(hamiltonian, PDWorkingMemory(vector; style=IsDeterministic()))
+end
+
+function (o::OperatorMultiplier)(v)
+    result = zerovector(v)
+    return mul!(result, o.hamiltonian, v, o.working_memory)
+end
+
+function KrylovKit.eigsolve(
+    ham::AbstractHamiltonian, vec::PDVec, howmany::Int=1, which::U=:LR; kwargs...
+)
+    # Change the type of `vec` to float, if needed.
+    v = scale!!(vec, 1.0)
+    prop = OperatorMultiplier(ham, v)
+    return eigsolve(
+        prop, v, howmany, which;
+        ishermitian=ishermitian(ham), issymmetric=issymmetric(ham), kwargs...
+    )
+end
+
+# This method only exists to detect whether a Hamiltonian is Hermitian or not.
+function KrylovKit.eigsolve(
+    ham::AbstractHamiltonian, vec::AbstractDVec, howmany::Int=1, which::U=:LR; kwargs...
+)
+    return @invoke eigsolve(
+        ham::Any, vec::Any, howmany, which;
+        ishermitian=ishermitian(ham), issymmetric=issymmetric(ham), kwargs...
+    )
+end
+
+end
diff --git a/scripts/BHM-example-mpi.jl b/scripts/BHM-example-mpi.jl
@@ -1,8 +1,8 @@
 # # Example 2: Rimu with MPI
 
-# In this example, we will demonstrate using Rimu with MPI. 
+# In this example, we will demonstrate using Rimu with MPI.
 
-# A runnable script for this example is located 
+# A runnable script for this example is located
 # [here](https://github.com/joachimbrand/Rimu.jl/blob/develop/scripts/BHM-example-mpi.jl).
 # Run it with `mpirun julia BHM-example-mpi.jl`.
 
@@ -16,17 +16,17 @@ using Rimu.RMPI
 
 # First, we define the Hamiltonian. We want to start from an address with zero momentum.
 
-address = BoseFS((0, 0, 0, 0, 10, 0, 0, 0, 0, 0))
+address = BoseFS(10, 5 => 10)
 
 # We will set the interaction strength `u` to `6`. The hopping strength `t` defaults to `1.0`.
 
 hamiltonian = HubbardMom1D(address; u=6.0)
 
-# Next, we construct the starting vector. Wrap a vector in `MPIData` to make it MPI
-# distributed. We set the vector's style to [`IsDynamicSemistochastic`](@ref), which
-# improves statistics and reduces the sign problem.
+# Next, we construct the starting vector. We use a `PDVec`, which is automatically MPI
+# distributed if MPI is available. We set the vector's stochastic style to
+# [`IsDynamicSemistochastic`](@ref), which improves statistics and reduces the sign problem.
 
-dvec = MPIData(DVec(address => 1.0; style=IsDynamicSemistochastic()))
+dvec = PDVec(address => 1.0; style=IsDynamicSemistochastic())
 
 # We set a reporting strategy. We will use [`ReportToFile`](@ref), which writes the reports
 # directly to a file. This is useful for reducing memory use in long-running jobs, as we
@@ -43,7 +43,7 @@ s_strat = DoubleLogUpdate(targetwalkers=10_000)
 post_step = ProjectedEnergy(hamiltonian, dvec)
 
  # The `@mpi_root` macro performs an action on the root rank only, which is useful for printing.
- 
+
 @mpi_root println("Running FCIQMC with ", mpi_size(), " rank(s).")
 
 # Finally, we can run the computation.
@@ -54,6 +54,6 @@ using Test                                          #hide
 @test isfile("result.arrow")                        #hide
 dfr = load_df("result.arrow")                       #hide
 qmcdata = last(dfr, 5000)                           #hide
-(qmcShift,qmcShiftErr) = mean_and_se(qmcdata.shift) #hide
-@test qmcShift ≈ -6.5 atol=0.5                      #hide
-rm("result.arrow", force=true)                      #hide
+qmc_shift, _ = mean_and_se(qmcdata.shift)           #hide
+@test qmc_shift ≈ -6.5 atol=0.5                     #hide
+rm("result.arrow", force=true)                      #hide