Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about metric calculation: potential issue in current implementation #199

Open
shenshanf opened this issue Feb 14, 2025 · 2 comments

Comments

@shenshanf
Copy link

Hi, I noticed that the current implementation calculates metrics (like EPE) in this way:

# In validation loop:
epe = F.l1_loss(gt_disp[mask], pred_disp[mask], reduction='mean')  # mean per iteration
val_epe += epe.item()
# After loop:
mean_epe = val_epe / valid_samples

This effectively computes: (EPE1/count1 + EPE2/count2 + ... + EPEn/countn) / n

This approach might be problematic when the number of valid pixels varies across iterations, as it gives equal weight to each iteration's mean regardless of how many valid pixels contributed to that mean.

I think it might be more accurate to use:

# In validation loop:
epe = F.l1_loss(gt_disp[mask], pred_disp[mask], reduction='sum')
val_epe += epe.item()
total_valid_pixels += mask.sum().item()
# After loop:
mean_epe = val_epe / total_valid_pixels

Which computes: (EPE1 + EPE2 + ... + EPEn) / (count1 + count2 + ... + countn)

Could you clarify which approach is mathematically correct for computing the overall metric? Thanks!

@shenshanf
Copy link
Author

采用第二种方法计算metric的好处是:

  1. 确保了每个有效像素对最终结果的贡献是相等的
  2. 避免了当不同迭代中有效像素数量差异较大时的计算偏差

Method 1的问题在于它先计算了每次迭代的平均值,然后又对这些平均值取平均,这样会导致有效像素数量少的迭代结果与像素数量多的迭代结果具有相同的权重,这显然是不合理的。
这种差异在处理具有大量mask或无效区域的数据(如深度估计、视差估计等任务)时尤为重要,因为不同迭代间的有效像素数可能差异很大。

@shenshanf
Copy link
Author

Here's my comment:

I think the current implementation (Method 1) might actually be correct, depending on what we want to measure.

Method 1 calculates "average EPE per sample":

epe = F.l1_loss(gt_disp[mask], pred_disp[mask], reduction='mean')  # mean per iteration
val_epe += epe.item()
mean_epe = val_epe / valid_samples  # (EPE1/count1 + EPE2/count2 + ... + EPEn/countn) / n

Method 2 calculates "average EPE of all valid pixels":

epe = F.l1_loss(gt_disp[mask], pred_disp[mask], reduction='sum')
val_epe += epe.item()
total_valid_pixels += mask.sum().item()
mean_epe = val_epe / total_valid_pixels  # (EPE1 + EPE2 + ... + EPEn) / (count1 + count2 + ... + countn)

If our goal is to evaluate "average performance per image" (giving equal weight to each sample regardless of its number of valid pixels), then Method 1 is the correct approach. This makes sense in scenarios where we care about the model's average performance on individual samples rather than individual pixels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant