You can find here a version of dgemm from ELPA tutorial which runs on single GPU.
- Build the code with provided build.sh script. Run the code with different matrix sizes, how does the performance compare to the CPU version?
You can find here a version of dgemm from ELPA tutorial which runs on single GPU.