Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR limits the number of GPUs available to BIDMat.
It solves the following issue. In a shared environment (grid, cloud, etc), the grid admin may grant a user a number of GPUs for the BIDMat job, however the physical IDs of the available GPUs are not given. The user need to find them first, then limit BIDMat to use those GPUs.
For example, say there are 8 GPUs (No. 0~No. 7) in a node, among them No. 0, No. 1, No. 3 are already used and not available to new jobs. 2 GPUs are granted to a new BIDMat job, but the program has to search for the available GPUs and limit itself to those. In this case, No. 2 and No. 4 are a pair available for this job.
Setting CUDA_VISIBLE_DEVICES is not an options since the GPUs are not known a priori.
In this PR, we store (physical device ID <--> logical device ID) map and convert between the two indices during setGPU and getGPU. In the example above:
physical device No. 2 <--> logical device No. 0
physical device No. 4 <--> logical device No. 1