-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RBKI #61
RBKI #61
Conversation
…h get_line function locally.
…bug, fixed norm bug.
@rileyjmurray I've added all the major changes to RBKI that I wanted to. Please give me comments on this PR when you have time. Also, LMK if we will be updating RandBLAS submodule in this PR. I will hold off on working on the GPU PR until we complete this. P.S. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the significant stuff looks good. I still left lots of comments though. I also have some recurring comments that I didn't bother mentioning for each individual file (see below).
benchmark documentation
I'd like for all benchmark scripts to have a comment towards the top that explains what it tests. Right now only a few files have this. Just a couple of sentences should suffice. I want to be able to easily infer what the differences are between things like RBKI_runtime_benchmark.cc
and RBKI_speed_comparisons.cc
. (I know you can explain it to be here, or in an email, or on a call, but I want it written in the files themselves.)
benchmarks on macOS
Several benchmarks have a pragma like #if !defined(__APPLE__)
inside main
which determines whether or not the executable does something nontrivial. This difference in behavior across platforms should be easier to identify at a glance. I suggest you use a pattern like the following:
#ifdef(__APPLE__)
#include <iostream>
int main(int argc, char **argc) {
std::cout << "This benchmark cannot run on Apple machines." << std::endl;
return 1;
}
#else
// all of your actual code ....
#endif
You can take care of this now, or you can open a new issue with this as a TODO and handle it in another PR.
template parameters in benchmarks
Most benchmark scripts include a function definition like
template <typename T, typename RNG>
static void
test_speed(int64_t m, int64_t n, int64_t runs, RandBLAS::RNGState<RNG> const_state)
which you invoke with lines like
test_speed<double, r123::Philox4x32>(std::pow(2, 10), std::pow(2, 5), 10, state);
Specifying r123::Philox4x32
as a template parameter shouldn't actually be needed. The compiler should be able to deduce it from the type of state
. If you run into issues with compilers automatically deducing types then I'd prefer you change the function definition to have a default type, along the following lines:
template <typename T, typename RNG = r123::Philox4x32>
static void
test_speed(int64_t m, int64_t n, int64_t runs, RandBLAS::RNGState<RNG> const_state)
I'd like for you to make these changes throughout all benchmarks. You can take care of this now, or you can open a new issue with this as a TODO and handle it in another PR.
I applied all the explicitly-requested fixed & added small comments to each benchmark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm marking as approved, but I do have two tiny comments that it would be nice if you addressed.
Merge at your discretion.
Looks like a bunch of QB & SVD tests are failing on Git (they are all perfectly fine on Elephant). Investigating. |
There was a memory leak in the condition number estimation utility function, as I forgot to free the memory allocated there. |
This PR adds a new driver algorithm - RBKI (Randomized Block Krylov Iteration) (formal name subject to change).
RBKI algorithm is a method for finding truncated SVD based on block Krylov iterations.
RBKI is applicable for matrices with a slowly-decaying spectrum; such cases are canonically considered ''hard`` for a standard randomized SVD.
This algorithm is a version of Algorithm A.1 from https://arxiv.org/pdf/2306.12418.pdf
The main difference is in the fact that an economy SVD is performed only once at the very end
of the algorithm run and that the termination criteria are not based on singular vector residual evaluation.
Instead, the scheme terminates if:
1. ||R||_F > sqrt(1 - eps^2) ||A||_F, which ensures that we've exhausted all vectors and doing more
iterations would bring no benefit or that ||A - hat(A)||_F < eps * ||A||_F.
2. Stop if the bottom right entry of R or S is numerically close to zero (up to the square root of machine eps).
The main cost of this algorithm comes from large GEMMs with the input matrix A.
The algorithm optionally times all its subcomponents through a user-defined 'timing' parameter.
Some things about the algorithm to keep in mind:
1. The need to iteratively allocate space for Krylov subspace buffers (Y_od, X_ev), as well as for the matrices that are to
be decomposed via SVD at the end of the alg (R, S). Additional allocation is done via
realloc()
, followed up bymemset()
call to ensure that the newly allocated space has been zeroed out.
2. Matrix R needs to be kept in a transposed format at all times, as the final decomposition via SVD is to be done on R'.
3. Reorthogonalization steps are a must. These require additional (n + b_sz) by b_sz and n by b_sz computational buffers.
4. 'Q-factors' from QR factorization at every iteration form the Krylov subspace -> we need explicit versions of those. Hence,
using
ungqr()
is required.5. Even though the algorithm accepts a "tolerance" parameter, we can control its termination using the "max_iters"
parameter.
In addition to the RBKI, the following is also introduced in this PR:
std::vector
to pointers in the/misc/rl_gen.hh
and/misc/rl_util.hh
process_input_mat()
functionality in/misc/rl_gen.hh
- allows to read a given input matrix into RandLAPACK (without knowing its size in advance) from a file.gesdd()
) to be compared against RBKI (direct SVD is WAY slower than RBKI). The user is required to provide a matrix file to be read, set min and max numbers of large gemms (Krylov iterations) that the algorithm is allowed to perform min and max block sizes that RBKI is to use; furthermore, the user is to provide a 'custom rank' parameter (number of singular vectors to approximate by RBKI). The benchmark outputs the basic data of a given run, as well as the RBKI runtime and singular vector residual error, which is computed as "sqrt(||AV - SU||^2_F + ||A'U - VS||^2_F / sqrt(custom_rank)" (for "custom rank" singular vectors and values).