broken after several epochs #23

WentsingNee · 2023-06-09T09:11:27Z

I'm a beginner of the re-ID problem and when I tried your code I received some broken results.

Firstly, I reduced the batch size to 24 because my GPU memory is not enough. I think this may have no relationship with the following problems but I'd better mention it in case you need.

    parser.add_argument('-b', '--batch-size', type=int, default=24)

Then, I tried the command:

time sh scripts/run_strong_baseline.sh market1501 dukemtmc resnet50

Just few minutes, I received the first broken result:

... Omit several lines ...
        (bn3): DSBN2d(
          (BN_S): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (BN_T): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (relu): ReLU(inplace=True)
      )
    )
    (gap): AdaptiveAvgPool2d(output_size=1)
    (feat_bn): DSBN1d(
      (BN_S): BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (BN_T): BatchNorm1d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (classifier): Linear(in_features=2048, out_features=17273, bias=False)
  )
)
XBM memory size =  29458
Extract Features: [50/539]	Time 0.286 (0.408)	Data 0.000 (0.010)	
Extract Features: [100/539]	Time 0.276 (0.342)	Data 0.000 (0.005)	
Extract Features: [150/539]	Time 0.271 (0.319)	Data 0.000 (0.004)	
Extract Features: [200/539]	Time 0.322 (0.326)	Data 0.000 (0.003)	
Extract Features: [250/539]	Time 0.278 (0.329)	Data 0.000 (0.002)	
Extract Features: [300/539]	Time 0.422 (0.324)	Data 0.000 (0.002)	
Extract Features: [350/539]	Time 0.304 (0.320)	Data 0.000 (0.002)	
Extract Features: [400/539]	Time 0.275 (0.315)	Data 0.000 (0.001)	
Extract Features: [450/539]	Time 0.278 (0.313)	Data 0.000 (0.001)	
Extract Features: [500/539]	Time 0.495 (0.314)	Data 0.000 (0.001)	
Extract Features: [50/689]	Time 0.299 (0.350)	Data 0.000 (0.012)	
Extract Features: [100/689]	Time 0.368 (0.336)	Data 0.000 (0.006)	
Extract Features: [150/689]	Time 0.381 (0.334)	Data 0.000 (0.004)	
Extract Features: [200/689]	Time 0.281 (0.326)	Data 0.000 (0.003)	
Extract Features: [250/689]	Time 0.297 (0.322)	Data 0.000 (0.002)	
Extract Features: [300/689]	Time 0.280 (0.321)	Data 0.000 (0.002)	
Extract Features: [350/689]	Time 0.278 (0.325)	Data 0.000 (0.002)	
Extract Features: [400/689]	Time 0.276 (0.325)	Data 0.000 (0.002)	
Extract Features: [450/689]	Time 0.342 (0.322)	Data 0.000 (0.001)	
Extract Features: [500/689]	Time 0.277 (0.320)	Data 0.000 (0.001)	
Extract Features: [550/689]	Time 0.277 (0.316)	Data 0.000 (0.001)	
Extract Features: [600/689]	Time 0.284 (0.314)	Data 0.000 (0.001)	
Extract Features: [650/689]	Time 0.291 (0.315)	Data 0.000 (0.001)	
==> Create pseudo labels for unlabeled target domain with DBSCAN clustering
Computing original distance...
Computing Jaccard distance...
Traceback (most recent call last):
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 321, in <module>
    main()
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 117, in main
    main_worker(args)
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 192, in main_worker
    rerank_dist = compute_jaccard_distance(target_features, k1=args.k1, k2=args.k2, use_gpu=False).numpy()
  File "/home/peter/open-source/git-cold/IDM/./idm/utils/rerank.py", line 177, in compute_jaccard_distance
    k2_rank = initial_rank[:, :k2].clone().view(-1)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

I didn't understand why this error happened but after I googled it, I modified the code as following:

k2_rank = initial_rank[:, :k2].clone().contiguous().view(-1)

I executed the same command again and it seems that it works now. However, after several epochs, I received another problem:

/usr/local/lib/python3.10/dist-packages/torchvision/transforms/transforms.py:329: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum.
  warnings.warn(
Epoch: [9][50/400]	Time 0.500 (0.513) Data 0.000 (0.004) Loss 3.018 (2.973) Loss_ce 1.718 (1.737) Loss_tri 0.111 (0.088) Loss_xbm 1.190 (1.148) Prec_s 95.83% (91.92%) Prec_t 95.83% (88.92%) 
Epoch: [9][100/400]	Time 0.523 (0.511) Data 0.000 (0.002) Loss 3.356 (2.985) Loss_ce 2.063 (1.757) Loss_tri 0.095 (0.090) Loss_xbm 1.198 (1.137) Prec_s 87.50% (91.17%) Prec_t 79.17% (89.29%) 
Epoch: [9][150/400]	Time 0.515 (0.527) Data 0.001 (0.018) Loss 2.735 (2.997) Loss_ce 1.638 (1.777) Loss_tri 0.023 (0.094) Loss_xbm 1.073 (1.126) Prec_s 87.50% (90.89%) Prec_t 100.00% (88.89%) 
Epoch: [9][200/400]	Time 0.505 (0.524) Data 0.000 (0.013) Loss 2.834 (2.966) Loss_ce 1.709 (1.756) Loss_tri 0.055 (0.093) Loss_xbm 1.070 (1.117) Prec_s 95.83% (91.50%) Prec_t 91.67% (89.54%) 
Epoch: [9][250/400]	Time 0.619 (0.523) Data 0.000 (0.011) Loss 3.049 (2.977) Loss_ce 1.899 (1.774) Loss_tri 0.106 (0.094) Loss_xbm 1.044 (1.109) Prec_s 91.67% (90.90%) Prec_t 83.33% (89.07%) 
Epoch: [9][300/400]	Time 0.533 (0.530) Data 0.000 (0.017) Loss 2.566 (2.955) Loss_ce 1.538 (1.761) Loss_tri 0.015 (0.093) Loss_xbm 1.013 (1.100) Prec_s 100.00% (90.99%) Prec_t 91.67% (89.42%) 
Epoch: [9][350/400]	Time 0.513 (0.528) Data 0.000 (0.015) Loss 3.231 (2.942) Loss_ce 1.914 (1.758) Loss_tri 0.143 (0.092) Loss_xbm 1.174 (1.092) Prec_s 79.17% (90.85%) Prec_t 79.17% (89.57%) 
Epoch: [9][400/400]	Time 1.717 (0.533) Data 1.199 (0.019) Loss 2.884 (2.935) Loss_ce 1.742 (1.759) Loss_tri 0.087 (0.091) Loss_xbm 1.055 (1.086) Prec_s 87.50% (90.80%) Prec_t 95.83% (89.57%) 
Test on target:  dukemtmc
Extract Features: [50/829]	Time 0.063 (0.083)	Data 0.000 (0.021)	
Extract Features: [100/829]	Time 0.064 (0.073)	Data 0.000 (0.011)	
Extract Features: [150/829]	Time 0.066 (0.070)	Data 0.000 (0.007)	
Extract Features: [200/829]	Time 0.064 (0.068)	Data 0.000 (0.005)	
Extract Features: [250/829]	Time 0.064 (0.067)	Data 0.000 (0.004)	
Extract Features: [300/829]	Time 0.063 (0.067)	Data 0.000 (0.004)	
Extract Features: [350/829]	Time 0.064 (0.066)	Data 0.000 (0.003)	
Extract Features: [400/829]	Time 0.063 (0.066)	Data 0.000 (0.003)	
Extract Features: [450/829]	Time 0.064 (0.066)	Data 0.000 (0.002)	
Extract Features: [500/829]	Time 0.064 (0.066)	Data 0.000 (0.002)	
Extract Features: [550/829]	Time 0.063 (0.065)	Data 0.000 (0.002)	
Extract Features: [600/829]	Time 0.063 (0.065)	Data 0.000 (0.002)	
Extract Features: [650/829]	Time 0.064 (0.065)	Data 0.000 (0.002)	
Extract Features: [700/829]	Time 0.066 (0.065)	Data 0.000 (0.002)	
Extract Features: [750/829]	Time 0.064 (0.065)	Data 0.000 (0.002)	
Extract Features: [800/829]	Time 0.064 (0.065)	Data 0.000 (0.001)	
Traceback (most recent call last):
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 321, in <module>
    main()
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 117, in main
    main_worker(args)
  File "/home/peter/open-source/git-cold/IDM/examples/train_baseline.py", line 229, in main_worker
    _, mAP = evaluator.evaluate(test_loader_target, dataset_target.query, dataset_target.gallery, cmc_flag=True)
  File "/home/peter/open-source/git-cold/IDM/./idm/evaluators.py", line 140, in evaluate
    results = evaluate_all(query_features, gallery_features, distmat, query=query, gallery=gallery, cmc_flag=cmc_flag)
  File "/home/peter/open-source/git-cold/IDM/./idm/evaluators.py", line 117, in evaluate_all
    results = evaluate_rank(distmat, query_ids, gallery_ids, query_cams, gallery_cams,
  File "/home/peter/open-source/git-cold/IDM/./idm/evaluation_metrics/rank.py", line 203, in evaluate_rank
    return evaluate_py(
  File "/home/peter/open-source/git-cold/IDM/./idm/evaluation_metrics/rank.py", line 164, in evaluate_py
    return eval_market1501(
  File "/home/peter/open-source/git-cold/IDM/./idm/evaluation_metrics/rank.py", line 108, in eval_market1501
    matches = (g_pids[indices] == q_pids[:, np.newaxis]).astype(np.int32)
TypeError: only integer tensors of a single element can be converted to an index

Could you please examine the code or point out my mistake?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

broken after several epochs #23

broken after several epochs #23

WentsingNee commented Jun 9, 2023

broken after several epochs #23

broken after several epochs #23

Comments

WentsingNee commented Jun 9, 2023