Question about metric calculation: potential issue in current implementation #199

shenshanf · 2025-02-14T13:16:41Z

Hi, I noticed that the current implementation calculates metrics (like EPE) in this way:

# In validation loop:
epe = F.l1_loss(gt_disp[mask], pred_disp[mask], reduction='mean')  # mean per iteration
val_epe += epe.item()
# After loop:
mean_epe = val_epe / valid_samples

This effectively computes: (EPE1/count1 + EPE2/count2 + ... + EPEn/countn) / n

This approach might be problematic when the number of valid pixels varies across iterations, as it gives equal weight to each iteration's mean regardless of how many valid pixels contributed to that mean.

I think it might be more accurate to use:

# In validation loop:
epe = F.l1_loss(gt_disp[mask], pred_disp[mask], reduction='sum')
val_epe += epe.item()
total_valid_pixels += mask.sum().item()
# After loop:
mean_epe = val_epe / total_valid_pixels

Which computes: (EPE1 + EPE2 + ... + EPEn) / (count1 + count2 + ... + countn)

Could you clarify which approach is mathematically correct for computing the overall metric? Thanks!

The text was updated successfully, but these errors were encountered:

shenshanf · 2025-02-14T13:21:34Z

采用第二种方法计算metric的好处是：

确保了每个有效像素对最终结果的贡献是相等的
避免了当不同迭代中有效像素数量差异较大时的计算偏差

Method 1的问题在于它先计算了每次迭代的平均值，然后又对这些平均值取平均，这样会导致有效像素数量少的迭代结果与像素数量多的迭代结果具有相同的权重，这显然是不合理的。
这种差异在处理具有大量mask或无效区域的数据（如深度估计、视差估计等任务）时尤为重要，因为不同迭代间的有效像素数可能差异很大。

shenshanf · 2025-02-14T13:26:43Z

Here's my comment:

I think the current implementation (Method 1) might actually be correct, depending on what we want to measure.

Method 1 calculates "average EPE per sample":

epe = F.l1_loss(gt_disp[mask], pred_disp[mask], reduction='mean')  # mean per iteration
val_epe += epe.item()
mean_epe = val_epe / valid_samples  # (EPE1/count1 + EPE2/count2 + ... + EPEn/countn) / n

Method 2 calculates "average EPE of all valid pixels":

epe = F.l1_loss(gt_disp[mask], pred_disp[mask], reduction='sum')
val_epe += epe.item()
total_valid_pixels += mask.sum().item()
mean_epe = val_epe / total_valid_pixels  # (EPE1 + EPE2 + ... + EPEn) / (count1 + count2 + ... + countn)

If our goal is to evaluate "average performance per image" (giving equal weight to each sample regardless of its number of valid pixels), then Method 1 is the correct approach. This makes sense in scenarios where we care about the model's average performance on individual samples rather than individual pixels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about metric calculation: potential issue in current implementation #199

Question about metric calculation: potential issue in current implementation #199

shenshanf commented Feb 14, 2025

shenshanf commented Feb 14, 2025

shenshanf commented Feb 14, 2025

Question about metric calculation: potential issue in current implementation #199

Question about metric calculation: potential issue in current implementation #199

Comments

shenshanf commented Feb 14, 2025

shenshanf commented Feb 14, 2025

shenshanf commented Feb 14, 2025