Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trained yolt stopped working?! #31

Open
admarrs opened this issue Apr 8, 2019 · 5 comments
Open

Trained yolt stopped working?! #31

admarrs opened this issue Apr 8, 2019 · 5 comments

Comments

@admarrs
Copy link

admarrs commented Apr 8, 2019

I'm scratching my head with this one. Back in March I had successfully trained Yolt using the COWC data and got some good test results on a separate data set.

Coming back a month later, I've tried to re-run the same config and can't get the same results! Probabilites are very low <0.01. The only thing that changed was a swap out of the Graphics card to upgrade to a Titan. Could this make a difference?

I was wondering if this was in anyway related to #26

@admarrs
Copy link
Author

admarrs commented Apr 8, 2019

Some additional info. running on the COWC test data I get the same low probability result but if I set the threshold to 0.01 I see the following, all the "detections" seem to be in rows at the bottom of each 544 pixel slice.

image

@h10public
Copy link

h10public commented Apr 12, 2019

I had experienced a similar problem. I checked line 9-14 in /simrdwn/yolt/Makefile. My GPU did not match with any of them, so I added one. Also, I changed the version of CUDA and TensorFlow in /simrdwn/docker/Dockerfile, and I reinstalled SIMRDWN.
You can find more information on "https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/".

@wendyzzzw
Copy link

@ghghgh777 I ran into a similar problem and your answer is very helpful! Can you share more details on how you modified the /simrdwn/docker/Dockerfile? Thanks!

@h10public
Copy link

@wendyzzzw Sorry for late reply. Well, I checked my answer only with the commit b275a35, so it may not work for the current commit.

Check "https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/". If you are using GPU matched with SM62, SM70, or SM75, you need to add a line like "-gencode arch=compute_62,code=[sm_62,compute_62]" after line 9-13 in /simrdwn/yolt2/Makefile and /simrdwn/yolt3/Makefile.

For /simrdwn/docker/Dockerfile, the code was updated, so it uses CUDA 9.0. I think CUDA 10 would be required if your GPU is matched with SM75. If you need to use CUDA 10, you can change line 2 and line 25-26. The current version of SIMRDWN uses tensorflow-gpu 1.13.1, so I think it would be OK.

After that, I reinstalled SIMRDWN from "0-3. Build docker file".

@avanetten
Copy link
Owner

As noted by @ghghgh777, this seems to be a gpu architecture issue, and has been observed in YOLO as well: pjreddie/darknet#486. I'm still digging into the issue, but it seems that there may be a compatibility issue with weights trained on older versions of CUDA. As painful as it seems, retraining the model with the new hardware/drivers worked for me to get around this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants