Trained yolt stopped working?! #31

admarrs · 2019-04-08T10:02:28Z

I'm scratching my head with this one. Back in March I had successfully trained Yolt using the COWC data and got some good test results on a separate data set.

Coming back a month later, I've tried to re-run the same config and can't get the same results! Probabilites are very low <0.01. The only thing that changed was a swap out of the Graphics card to upgrade to a Titan. Could this make a difference?

I was wondering if this was in anyway related to #26

admarrs · 2019-04-08T10:19:48Z

Some additional info. running on the COWC test data I get the same low probability result but if I set the threshold to 0.01 I see the following, all the "detections" seem to be in rows at the bottom of each 544 pixel slice.

h10public · 2019-04-12T08:41:02Z

I had experienced a similar problem. I checked line 9-14 in /simrdwn/yolt/Makefile. My GPU did not match with any of them, so I added one. Also, I changed the version of CUDA and TensorFlow in /simrdwn/docker/Dockerfile, and I reinstalled SIMRDWN.
You can find more information on "https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/".

wendyzzzw · 2019-05-30T06:44:17Z

@ghghgh777 I ran into a similar problem and your answer is very helpful! Can you share more details on how you modified the /simrdwn/docker/Dockerfile? Thanks!

h10public · 2019-06-06T04:43:34Z

@wendyzzzw Sorry for late reply. Well, I checked my answer only with the commit b275a35, so it may not work for the current commit.

Check "https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/". If you are using GPU matched with SM62, SM70, or SM75, you need to add a line like "-gencode arch=compute_62,code=[sm_62,compute_62]" after line 9-13 in /simrdwn/yolt2/Makefile and /simrdwn/yolt3/Makefile.

For /simrdwn/docker/Dockerfile, the code was updated, so it uses CUDA 9.0. I think CUDA 10 would be required if your GPU is matched with SM75. If you need to use CUDA 10, you can change line 2 and line 25-26. The current version of SIMRDWN uses tensorflow-gpu 1.13.1, so I think it would be OK.

After that, I reinstalled SIMRDWN from "0-3. Build docker file".

avanetten · 2019-06-14T00:31:46Z

As noted by @ghghgh777, this seems to be a gpu architecture issue, and has been observed in YOLO as well: pjreddie/darknet#486. I'm still digging into the issue, but it seems that there may be a compatibility issue with weights trained on older versions of CUDA. As painful as it seems, retraining the model with the new hardware/drivers worked for me to get around this issue.

avanetten mentioned this issue Nov 7, 2019

No detection during test time #38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trained yolt stopped working?! #31

Trained yolt stopped working?! #31

admarrs commented Apr 8, 2019

admarrs commented Apr 8, 2019

h10public commented Apr 12, 2019 •

edited

Loading

wendyzzzw commented May 30, 2019

h10public commented Jun 6, 2019

avanetten commented Jun 14, 2019

Trained yolt stopped working?! #31

Trained yolt stopped working?! #31

Comments

admarrs commented Apr 8, 2019

admarrs commented Apr 8, 2019

h10public commented Apr 12, 2019 • edited Loading

wendyzzzw commented May 30, 2019

h10public commented Jun 6, 2019

avanetten commented Jun 14, 2019

h10public commented Apr 12, 2019 •

edited

Loading