Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Issue on Windows with multiple GPUs #443

Open
ndming opened this issue Sep 27, 2024 · 4 comments
Open

Performance Issue on Windows with multiple GPUs #443

ndming opened this issue Sep 27, 2024 · 4 comments

Comments

@ndming
Copy link

ndming commented Sep 27, 2024

I built pbrt with the following setup:
MSVC 143 (VS BuildTools 17.9)
CUDA 12.1
OptiX 7.7.0

I have 2 RTX 2080 SUPERs. Rendering a simple scene with CPU took 9.8s, while rendering the same scene with GPU using --gpu-device 0 or --gpu-device 1 took 35.5s, significantly slower.

I noticed that on idle, nvidia-smi reports my 2 GPUs have around 2% work, but once I start rendering with either the GPU, the chosen GPU gets nearly 90% work reported from nvidia-smi.

Please let me know if I should change build settings to run pbrt faster on GPU. Should I upgrade to recent versions of CUDA or OptiX?

Here is some stats from the GPU rendering:

Wavefront Kernel Profile:
  Generate camera rays                                128 launches    974.10 ms /   2.7% (avg  7.610, min  7.135, max   9.175)
  Generate ray samples - HaltonSampler                768 launches    599.40 ms /   1.6% (avg  0.780, min  0.111, max   2.818)
  Trace closest hit rays                              768 launches  30105.62 ms /  82.7% (avg 39.200, min  0.535, max 184.803)
  Handle escaped rays                                 768 launches    663.86 ms /   1.8% (avg  0.864, min  0.075, max   4.469)
  Handle emitters hit by indirect rays                768 launches     63.63 ms /   0.2% (avg  0.083, min  0.058, max   0.145)
  DielectricMaterial + BxDF eval (Basic tex)          640 launches    943.32 ms /   2.6% (avg  1.474, min  0.104, max   4.884)
  DiffuseMaterial + BxDF eval (Basic tex)             640 launches     55.94 ms /   0.2% (avg  0.087, min  0.058, max   0.375)
  DiffuseMaterial + BxDF eval (Universal tex)         640 launches   1692.83 ms /   4.6% (avg  2.645, min  0.163, max  12.804)
  Trace shadow rays                                   640 launches    420.43 ms /   1.2% (avg  0.657, min  0.107, max   2.505)
  Update film                                         128 launches    856.00 ms /   2.4% (avg  6.688, min  5.003, max  10.738)
  Other                                              2304 launches     44.81 ms /   0.1% (avg  0.019)

Total rendering time:  36419.94 ms

Wavefront integrator statistics:
    Camera rays                                                  20480000
    Indirect rays, depth 1                                       16417569
    Indirect rays, depth 2                                        4231887
    Indirect rays, depth 3                                        1880224
    Indirect rays, depth 4                                         194875
    Indirect rays, depth 5                                          87770
    Shadow rays, depth 0                                          6416378
    Shadow rays, depth 1                                            87296
    Shadow rays, depth 2                                          1153977
    Shadow rays, depth 3                                            33566
    Shadow rays, depth 4                                            32123

Statistics:
  Geometry
    Spheres                                                             1
    Buffer cache hits                                    0 /            3 (0.00%)
    Bilinear patches per mesh                            1 /            1 (1.00x)
  Memory
    Acceleration structures                                          3.12 kB
    Bilinear patches                                                 0.06 kB
    Film pixels                                                      8.54 MiB
    Light BVH                                                        0.06 kB
    Wavefront integrator pixel state                               278.49 MiB
    Unreported / unused                                            306.13 MiB
  Scene
    Lights                                                              2
    Materials                                                           3
    Textures                                                            3
@ndming
Copy link
Author

ndming commented Sep 27, 2024

Rebuilding pbrt with CUDA 12.4 and OptiX 8.0.0 solved the problem for me. It's worth saying that with CUDA 12.1 and OtpiX 7.7 running on a GTX 1050 doesn't face this slow down issue.

@ndming ndming closed this as completed Sep 27, 2024
@ndming ndming reopened this Sep 27, 2024
@ndming
Copy link
Author

ndming commented Sep 27, 2024

The above fixed the issue, however, the rendered scene is completely black, any idea how can I fix this?

@NicNel
Copy link

NicNel commented Sep 27, 2024

@ndming, try to use buildtools v14.39,
original post #428, how to fix #429

@ndming
Copy link
Author

ndming commented Sep 27, 2024

It seems like on Windows systems with more than 1 GPU device, there's no way to fix performance slowdown. I was able to sucessfully build pbrt with both CUDA versions when downgrading MSVC, but the rendering time with GPU stays longer, since my system has 2 RTX 2080s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants