-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simultaneous query of multiple nodes #55
Comments
Numpy has built-in functionality for using multiple processor cores when performing the multiplications involved in DWPC. Scipy, however, does not natively support multi-processing for sparse matrix multiplication. Following a suggestion by @dhimmel, I used a multi-core approach with sparse matrices to re-run the 752 hetmech-compatible Rephetio metapaths. This cut the computation time down even more, to a total of just under 20 minutes. The individual metapath computation times did not significantly change, and the maximum path computation time was 31 seconds (see below). Following yet another suggestion by @dhimmel, I used the I also attempted to use the |
Nice work! I still think we can bring down the runtime on the longest-running instances by using optimal matrix-chain-ordering. |
|
We can probably set Z-DWPC in the first row (where all DPWCs are zero) to 0. The Inf situations will be harder to deal with because they are potentially of interest, but it will be difficult to rank them. |
Add functionality in hetmech to query a set of nodes in order to return a ranked list of connection predictions along with the corresponding metapaths.
For example, if a set of genes were queried, we would want output of the form:
Predictions
In order to reduce computation time, it may be useful to cache DWPC matrices so that a set of query nodes (given as a vector) can be queried almost instantly (order ~ 100 microseconds).
It should be noted that after work done in #54 and #43 to add sparse matrix functionality, the computation time for DWPC over all 752 compatible Rephetio metapaths has been reduced from a total of 6.5 hours to 48 minutes! In fact, the longest computation time for DWPC over a single metapath is now around 35 seconds, while the average time is about 3.9 seconds (see below).
Caching the DWPC matrices for the 752 Rephetio metapaths using
scipy.io.savemat
saves 752 sparse matrices as a.mat
file. The file size for all these matrices is 461 MB.The histogram below shows the distribution of DWPC times over the 752 metapaths.
The text was updated successfully, but these errors were encountered: