-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trained yolt stopped working?! #31
Comments
I had experienced a similar problem. I checked line 9-14 in /simrdwn/yolt/Makefile. My GPU did not match with any of them, so I added one. Also, I changed the version of CUDA and TensorFlow in /simrdwn/docker/Dockerfile, and I reinstalled SIMRDWN. |
@ghghgh777 I ran into a similar problem and your answer is very helpful! Can you share more details on how you modified the /simrdwn/docker/Dockerfile? Thanks! |
@wendyzzzw Sorry for late reply. Well, I checked my answer only with the commit b275a35, so it may not work for the current commit. Check "https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/". If you are using GPU matched with SM62, SM70, or SM75, you need to add a line like "-gencode arch=compute_62,code=[sm_62,compute_62]" after line 9-13 in /simrdwn/yolt2/Makefile and /simrdwn/yolt3/Makefile. For /simrdwn/docker/Dockerfile, the code was updated, so it uses CUDA 9.0. I think CUDA 10 would be required if your GPU is matched with SM75. If you need to use CUDA 10, you can change line 2 and line 25-26. The current version of SIMRDWN uses tensorflow-gpu 1.13.1, so I think it would be OK. After that, I reinstalled SIMRDWN from "0-3. Build docker file". |
As noted by @ghghgh777, this seems to be a gpu architecture issue, and has been observed in YOLO as well: pjreddie/darknet#486. I'm still digging into the issue, but it seems that there may be a compatibility issue with weights trained on older versions of CUDA. As painful as it seems, retraining the model with the new hardware/drivers worked for me to get around this issue. |
I'm scratching my head with this one. Back in March I had successfully trained Yolt using the COWC data and got some good test results on a separate data set.
Coming back a month later, I've tried to re-run the same config and can't get the same results! Probabilites are very low <0.01. The only thing that changed was a swap out of the Graphics card to upgrade to a Titan. Could this make a difference?
I was wondering if this was in anyway related to #26
The text was updated successfully, but these errors were encountered: