The current(2020.12) GPGPU-Sim supports up to the 1st Gen(Volta) NVIDIA tensor core. This distribution consists of GPGPU-Sim enabled Turing WMMA API and its benchmark results. Each directory inside the Benchmark directory has hardware benchmark results and revised gpgpu-sim benchmark results.
In this study, the microarchitecture of Tensor Core in Turing architecture is proposed. Since NVIDIA does not disclose the inside of the tensor core, it is necessary to profile through microbenchmarking. Dissecting the NVIDIA GPUs has also been done in previous studies. However, it was not revealed about the experimental features of the Turing architecture, i.e. INT4(int 4-bit) operation mode and B1(binary 1-bit) operation mode. All of these functions were analyzed in this study.
gpgpu-sim
- GPGPU-Sim enalbed Turing WMMA API
Benchmark
- b1(1-bit)
- u4(unsigned 4-bit)
- u8(unsigned 8-bit)
- fp16(floating point 16-bit)
- mixed(mixed precision)
Paper
- Thesis paper
- Thesis paper
GPGPU-Sim 4.0
(refer to https://github.com/gpgpu-sim/gpgpu-sim_distribution)CUDA 10
or higher- NVIDIA graphic card with
sm_75
or higher(after Volta arch.)
- Go to the directory you want to benchmark.
- Set the matrix size at
test.cu
inside hard directory. $ make
- See results in the log file
- build GPGPU-Sim(check its version of CUDA is 10 or higher)
- Set the matrix size at
test.cu
inside sim directory. $ make
- See the result shown by simulator.