Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broken after several epochs #23

Open
WentsingNee opened this issue Jun 9, 2023 · 0 comments
Open

broken after several epochs #23

WentsingNee opened this issue Jun 9, 2023 · 0 comments

Comments

@WentsingNee
Copy link

I'm a beginner of the re-ID problem and when I tried your code I received some broken results.

Firstly, I reduced the batch size to 24 because my GPU memory is not enough. I think this may have no relationship with the following problems but I'd better mention it in case you need.

    parser.add_argument('-b', '--batch-size', type=int, default=24)

Then, I tried the command:

time sh scripts/run_strong_baseline.sh market1501 dukemtmc resnet50

Just few minutes, I received the first broken result:

... Omit several lines ...
        (bn3): DSBN2d(
          (BN_S): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (BN_T): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (relu): ReLU(inplace=True)
      )
    )
    (gap): AdaptiveAvgPool2d(output_size=1)
    (feat_bn): DSBN1d(
      (BN_S): BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (BN_T): BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (classifier): Linear(in_features=2048, out_features=17273, bias=False)
  )
)
XBM memory size =  29458
Extract Features: [50/539]	Time 0.286 (0.408)	Data 0.000 (0.010)	
Extract Features: [100/539]	Time 0.276 (0.342)	Data 0.000 (0.005)	
Extract Features: [150/539]	Time 0.271 (0.319)	Data 0.000 (0.004)	
Extract Features: [200/539]	Time 0.322 (0.326)	Data 0.000 (0.003)	
Extract Features: [250/539]	Time 0.278 (0.329)	Data 0.000 (0.002)	
Extract Features: [300/539]	Time 0.422 (0.324)	Data 0.000 (0.002)	
Extract Features: [350/539]	Time 0.304 (0.320)	Data 0.000 (0.002)	
Extract Features: [400/539]	Time 0.275 (0.315)	Data 0.000 (0.001)	
Extract Features: [450/539]	Time 0.278 (0.313)	Data 0.000 (0.001)	
Extract Features: [500/539]	Time 0.495 (0.314)	Data 0.000 (0.001)	
Extract Features: [50/689]	Time 0.299 (0.350)	Data 0.000 (0.012)	
Extract Features: [100/689]	Time 0.368 (0.336)	Data 0.000 (0.006)	
Extract Features: [150/689]	Time 0.381 (0.334)	Data 0.000 (0.004)	
Extract Features: [200/689]	Time 0.281 (0.326)	Data 0.000 (0.003)	
Extract Features: [250/689]	Time 0.297 (0.322)	Data 0.000 (0.002)	
Extract Features: [300/689]	Time 0.280 (0.321)	Data 0.000 (0.002)	
Extract Features: [350/689]	Time 0.278 (0.325)	Data 0.000 (0.002)	
Extract Features: [400/689]	Time 0.276 (0.325)	Data 0.000 (0.002)	
Extract Features: [450/689]	Time 0.342 (0.322)	Data 0.000 (0.001)	
Extract Features: [500/689]	Time 0.277 (0.320)	Data 0.000 (0.001)	
Extract Features: [550/689]	Time 0.277 (0.316)	Data 0.000 (0.001)	
Extract Features: [600/689]	Time 0.284 (0.314)	Data 0.000 (0.001)	
Extract Features: [650/689]	Time 0.291 (0.315)	Data 0.000 (0.001)	
==> Create pseudo labels for unlabeled target domain with DBSCAN clustering
Computing original distance...
Computing Jaccard distance...
Traceback (most recent call last):
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 321, in <module>
    main()
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 117, in main
    main_worker(args)
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 192, in main_worker
    rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, use_gpu=False).numpy()
  File "/home/peter/open-source/git-cold/IDM/./idm/utils/rerank.py", line 177, in compute_jaccard_distance
    k2_rank = initial_rank[:, :k2].clone().view(-1)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

I didn't understand why this error happened but after I googled it, I modified the code as following:

k2_rank = initial_rank[:, :k2].clone().contiguous().view(-1)

I executed the same command again and it seems that it works now. However, after several epochs, I received another problem:

/usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py:329: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum.
  warnings.warn(
Epoch: [9][50/400]	Time 0.500 (0.513) Data 0.000 (0.004) Loss 3.018 (2.973) Loss_ce 1.718 (1.737) Loss_tri 0.111 (0.088) Loss_xbm 1.190 (1.148) Prec_s 95.83% (91.92%) Prec_t 95.83% (88.92%) 
Epoch: [9][100/400]	Time 0.523 (0.511) Data 0.000 (0.002) Loss 3.356 (2.985) Loss_ce 2.063 (1.757) Loss_tri 0.095 (0.090) Loss_xbm 1.198 (1.137) Prec_s 87.50% (91.17%) Prec_t 79.17% (89.29%) 
Epoch: [9][150/400]	Time 0.515 (0.527) Data 0.001 (0.018) Loss 2.735 (2.997) Loss_ce 1.638 (1.777) Loss_tri 0.023 (0.094) Loss_xbm 1.073 (1.126) Prec_s 87.50% (90.89%) Prec_t 100.00% (88.89%) 
Epoch: [9][200/400]	Time 0.505 (0.524) Data 0.000 (0.013) Loss 2.834 (2.966) Loss_ce 1.709 (1.756) Loss_tri 0.055 (0.093) Loss_xbm 1.070 (1.117) Prec_s 95.83% (91.50%) Prec_t 91.67% (89.54%) 
Epoch: [9][250/400]	Time 0.619 (0.523) Data 0.000 (0.011) Loss 3.049 (2.977) Loss_ce 1.899 (1.774) Loss_tri 0.106 (0.094) Loss_xbm 1.044 (1.109) Prec_s 91.67% (90.90%) Prec_t 83.33% (89.07%) 
Epoch: [9][300/400]	Time 0.533 (0.530) Data 0.000 (0.017) Loss 2.566 (2.955) Loss_ce 1.538 (1.761) Loss_tri 0.015 (0.093) Loss_xbm 1.013 (1.100) Prec_s 100.00% (90.99%) Prec_t 91.67% (89.42%) 
Epoch: [9][350/400]	Time 0.513 (0.528) Data 0.000 (0.015) Loss 3.231 (2.942) Loss_ce 1.914 (1.758) Loss_tri 0.143 (0.092) Loss_xbm 1.174 (1.092) Prec_s 79.17% (90.85%) Prec_t 79.17% (89.57%) 
Epoch: [9][400/400]	Time 1.717 (0.533) Data 1.199 (0.019) Loss 2.884 (2.935) Loss_ce 1.742 (1.759) Loss_tri 0.087 (0.091) Loss_xbm 1.055 (1.086) Prec_s 87.50% (90.80%) Prec_t 95.83% (89.57%) 
Test on target:  dukemtmc
Extract Features: [50/829]	Time 0.063 (0.083)	Data 0.000 (0.021)	
Extract Features: [100/829]	Time 0.064 (0.073)	Data 0.000 (0.011)	
Extract Features: [150/829]	Time 0.066 (0.070)	Data 0.000 (0.007)	
Extract Features: [200/829]	Time 0.064 (0.068)	Data 0.000 (0.005)	
Extract Features: [250/829]	Time 0.064 (0.067)	Data 0.000 (0.004)	
Extract Features: [300/829]	Time 0.063 (0.067)	Data 0.000 (0.004)	
Extract Features: [350/829]	Time 0.064 (0.066)	Data 0.000 (0.003)	
Extract Features: [400/829]	Time 0.063 (0.066)	Data 0.000 (0.003)	
Extract Features: [450/829]	Time 0.064 (0.066)	Data 0.000 (0.002)	
Extract Features: [500/829]	Time 0.064 (0.066)	Data 0.000 (0.002)	
Extract Features: [550/829]	Time 0.063 (0.065)	Data 0.000 (0.002)	
Extract Features: [600/829]	Time 0.063 (0.065)	Data 0.000 (0.002)	
Extract Features: [650/829]	Time 0.064 (0.065)	Data 0.000 (0.002)	
Extract Features: [700/829]	Time 0.066 (0.065)	Data 0.000 (0.002)	
Extract Features: [750/829]	Time 0.064 (0.065)	Data 0.000 (0.002)	
Extract Features: [800/829]	Time 0.064 (0.065)	Data 0.000 (0.001)	
Traceback (most recent call last):
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 321, in <module>
    main()
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 117, in main
    main_worker(args)
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 229, in main_worker
    _, mAP = evaluator.evaluate(test_loader_target, dataset_target.query, dataset_target.gallery, cmc_flag=True)
  File "/home/peter/open-source/git-cold/IDM/./idm/evaluators.py", line 140, in evaluate
    results = evaluate_all(query_features, gallery_features, distmat, query=query, gallery=gallery, cmc_flag=cmc_flag)
  File "/home/peter/open-source/git-cold/IDM/./idm/evaluators.py", line 117, in evaluate_all
    results = evaluate_rank(distmat, query_ids, gallery_ids, query_cams, gallery_cams,
  File "/home/peter/open-source/git-cold/IDM/./idm/evaluation_metrics/rank.py", line 203, in evaluate_rank
    return evaluate_py(
  File "/home/peter/open-source/git-cold/IDM/./idm/evaluation_metrics/rank.py", line 164, in evaluate_py
    return eval_market1501(
  File "/home/peter/open-source/git-cold/IDM/./idm/evaluation_metrics/rank.py", line 108, in eval_market1501
    matches = (g_pids[indices] == q_pids[:, np.newaxis]).astype(np.int32)
TypeError: only integer tensors of a single element can be converted to an index

Could you please examine the code or point out my mistake?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant