-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UNet Shallow host read-back performance is slow #12837
Comments
Readback perf from hugepage to host buffer is ~2GB/s. @pgkeller fyi this is the FD readback performance issue. UNet folks are also pursuing another change to improve readback by reducing the amount of data needing to be read #12705. We can determine how urgent improving FD readback is by estimating if the smaller amount of data at 2GB/s is enough to reduce the host bottleneck. |
Currently output is 21,626,880B, and at 2GB/s takes ~10-11ms to read. If we were able to remove all padding, then we would only need to read 675,840B, and at 2GB/s would take ~0.3ms, which I think should be sufficient to remove the host bottleneck, so improving readback from hugepage may not be as high priority. |
@tt-aho I agree - this is lower priority than #12896, #12705. Once the padding is removed, we can re-assess the overhead and maybe address this. I'll make this P1 for now. |
what's the status on this? do we have more optimization to do here? |
@pgkeller I think we can close this. Nigel’s recent changes show good enough R/W speeds to meet our 2000 fps goal. See here: #12961 (comment) |
Summary
Based on what we are seeing in the tracy profile, UNet is spending a long time in the host read back which happens when we call synchronize device. Read-back memcpy could be improved - it seems like we are doing reads/writes on each cores which is not ideal since we’re generating 1 command per core.
The text was updated successfully, but these errors were encountered: