Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use adaptive CUDA launch config to fully utilize GPU devices #111

Closed
kloudkl opened this issue Feb 15, 2014 · 2 comments
Closed

Use adaptive CUDA launch config to fully utilize GPU devices #111

kloudkl opened this issue Feb 15, 2014 · 2 comments
Labels

Comments

@kloudkl
Copy link
Contributor

kloudkl commented Feb 15, 2014

@sguada published the first profiling result in #81. Starting from there, we can do some more in depth analysis of key success factors such as occupancy to improve device utilization. Occupancy is defined as the ratio of active warps versus the maximum number of warps of a GPU. CUDA visual profiler and occupancy calculator both provide such data.

The best practice guide gives some general principles of execution configuration optimizations to effectively manage the resource utilization. Jared Hoberock, a NVIDIA researcher and co-creator of CUDA template library Thrust, put them into practice with adaptive CUDA launch configurations whose only essential dependency is cuda_runtime_api.h which will not introduce any new dependency into Caffe.

@kloudkl
Copy link
Contributor Author

kloudkl commented Aug 28, 2014

CUDA Pro Tip: Occupancy API Simplifies Launch Configuration
http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-occupancy-api-simplifies-launch-configuration/

@shelhamer
Copy link
Member

Closing as this is not a significant bottleneck at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants