Optimization of CUDA kernel for multiple GPUs #342

cyborgshead · 2019-06-25T11:18:15Z

We have network limits which linked to the speed of processing (rank window) and size of the graph (onboard memory of GPU/GPUs)

Mainnet will be started with a pretty big rank's calculation window (>=100 blocks) and a small amount of network bandwidth which will provide time to kernel upgrade by the community and also do hardware upgrades by validators.

Now, much time take to prepare data before sending it to GPU and we have only single GPU CUDA PageRank algorithm implementation.

My proposal to starts with stand-alone optimized kernel and redefine data structures during performance research and implementation of multiple GPUs kernel. Then make refactoring of structures in cyberd and migrate to the new kernel.

References: #229

Note:

Single host, single GPU
<-----We are here----->
Single host, multiple GPUs (x16 PCI Express)
Multiple hosts, multiple GPUs

serejandmyself · 2019-10-25T14:00:07Z

I re-wrote your task slightly - as per how I understand it. It would be good if you could correct me if I got it wrong.

In any case, I also added some (maybe) useful links to some research and articles:

Current situation:

Rank is calculated with a rate of 100ms
Data is uploaded in 3 seconds
This means 3mil CIDs per calculation max
2d array is used GPU calculation

Problem:

This is a network limit due to processing speed
It takes too much time take to prepare data before sending it to GPU and there is only a single GPU CUDA PageRank algorithm implementation

Task:

Implement a CUDA kernel that is able to calculate rank on a single host but on multiple GPUs or
Even more desired, multiple hosts - multiple GPUs

Desired outcome:

To shard the knowledge graph to a number of cards on one machine (or a number of cards on multiple machines)

Steps to solve:

Research the current possibilities of CUDA (current and near future in order to avoid obsolete implementations)
Understand the efficiency of CUDA kernels
Understand the implementation to core
Describe an in-detail process of implementation (what will it break / what will have to be fixed)
Integrate into cyberd

What might be needed (?):

An algorithm developer
A new kernel that will calculate the Merkle tree for all rank values

Some articles and research that might be useful:
(not all might be useful)

cyborgshead self-assigned this Jun 25, 2019

cyborgshead pinned this issue Jul 24, 2019

cyborgshead added Code: Rank Priority: Medium Status: On Analysis Type: Enhancement Type: Feature and removed Type: Enhancement labels Jul 24, 2019

cyborgshead mentioned this issue Oct 16, 2019

Rewrite GPU kernel to accept pointers array instead of linear array. #229

Open

cyborgshead unpinned this issue Nov 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization of CUDA kernel for multiple GPUs #342

Optimization of CUDA kernel for multiple GPUs #342

cyborgshead commented Jun 25, 2019 •

edited

Loading

serejandmyself commented Oct 25, 2019 •

edited

Loading

Optimization of CUDA kernel for multiple GPUs #342

Optimization of CUDA kernel for multiple GPUs #342

Comments

cyborgshead commented Jun 25, 2019 • edited Loading

serejandmyself commented Oct 25, 2019 • edited Loading

cyborgshead commented Jun 25, 2019 •

edited

Loading

serejandmyself commented Oct 25, 2019 •

edited

Loading