-
In https://github.com/b0nes164/GPUSorting/blob/main/GPUSortingCUDA/Sort/OneSweep.cu#L49
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
We use 2 histograms to avoid atomic conflicts as we count digits in the histogramming. Twice the histograms means half the conflicts, and most contemporary GPUs have ample shared memory, so there is no worry about cutting into occupancy. Why not more than 2 histograms then? Going past 2 necessitates more complicated reduction operations across the histograms, which may slow down the kernel. So we use 2 as a nice sweet spot between conflict avoidance, and low histogram reduction cost. |
Beta Was this translation helpful? Give feedback.
We use 2 histograms to avoid atomic conflicts as we count digits in the histogramming. Twice the histograms means half the conflicts, and most contemporary GPUs have ample shared memory, so there is no worry about cutting into occupancy. Why not more than 2 histograms then? Going past 2 necessitates more complicated reduction operations across the histograms, which may slow down the kernel. So we use 2 as a nice sweet spot between conflict avoidance, and low histogram reduction cost.