MSELoss #435

awayzjj · 2025-01-29T14:47:40Z

PR Category

Type of Change

Description

Issue

Closes: #396

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

On NVIDIA A10

StrongSpoon

good performance!

StrongSpoon · 2025-02-07T07:17:30Z

src/flag_gems/ops/mse_loss.py

+    if reduction == Reduction.NONE.value:
+        return func(inp, target)
+
+    M = inp.numel()


suggest making inp and target contiguous

StrongSpoon · 2025-02-07T07:18:12Z

src/flag_gems/ops/mse_loss.py

+    M = inp.numel()
+    dtype = inp.dtype
+    if dtype is torch.bool:
+        inp = inp.to(torch.int64)


what about target? does it need to be upcasted unto torch.int64?

I fould that torch mse do not support bool and int.

RuntimeError: "mse_cuda" not implemented for 'Bool'
RuntimeError: "mse_cuda" not implemented for 'Bool'

StrongSpoon · 2025-02-07T08:23:39Z

src/flag_gems/ops/mse_loss.py

+@triton.jit
+def kernel_1(inp, target, mid, M, BLOCK_SIZE: tl.constexpr, reduction: tl.constexpr):
+    if tl.constexpr(inp.dtype.element_ty == tl.float16) or tl.constexpr(
+        inp.dtype.element_ty == tl.bfloat16


dtype of input tensor will be fixed as constant when compiling. it's not necessary to specify it explicitly.

StrongSpoon · 2025-02-07T08:27:58Z

src/flag_gems/ops/mse_loss.py

+    ):
+        cdtype = tl.float32
+    else:
+        cdtype = inp.dtype.element_ty


assigning cdtype as tl.float32 is more simple.

StrongSpoon · 2025-02-07T08:29:20Z

src/flag_gems/ops/mse_loss.py

+    ):
+        cdtype = tl.float32
+    else:
+        cdtype = mid.dtype.element_ty


StrongSpoon · 2025-02-07T08:30:07Z

src/flag_gems/ops/mse_loss.py

+    mid_size = triton.cdiv(M, block_size)
+    block_mid = triton.next_power_of_2(mid_size)
+
+    mid = torch.empty((mid_size,), dtype=dtype, device=inp.device)


initializing mid as torch.float32 might improve the precision.

0x45f · 2025-02-07T09:56:59Z

tests/test_reduction_ops.py

+def test_accuracy_mse_loss(shape, dtype, reduction):
+    dim = 1
+    inp = torch.randn(shape, dtype=dtype, device="cuda", requires_grad=True)
+    target = torch.randn(shape, dtype=dtype, device="cuda", requires_grad=True)


May requires_grad be set to false?

0x45f · 2025-02-07T09:57:43Z

tests/test_reduction_ops.py

+    inp = torch.randn(shape, dtype=dtype, device="cuda", requires_grad=True)
+    target = torch.randn(shape, dtype=dtype, device="cuda", requires_grad=True)
+
+    ref_inp = to_reference(inp, True)


Why upcast=True?

Since the dtype is float and the operation involves reduction, setting upcast=True is necessary to obtain a higher precision for reference ?

awayzjj · 2025-02-08T07:39:41Z

@StrongSpoon @0x45f Hi, I don not understand why the converage CI failed.

junjian.zhan added 3 commits January 29, 2025 14:18

[Operator] Add MSE LOSS

0b4a2f1

use pointwise_dynamic to speedup

2fe9f72

add pytest.mark

f8b1d05

0x45f self-assigned this Feb 7, 2025

StrongSpoon reviewed Feb 7, 2025

View reviewed changes

0x45f reviewed Feb 7, 2025

View reviewed changes

clean code and improve precision

fd09735

awayzjj requested review from StrongSpoon and 0x45f February 8, 2025 07:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSELoss #435

MSELoss #435

awayzjj commented Jan 29, 2025 •

edited

Loading

StrongSpoon left a comment

StrongSpoon Feb 7, 2025

StrongSpoon Feb 7, 2025

awayzjj Feb 8, 2025

StrongSpoon Feb 7, 2025

StrongSpoon Feb 7, 2025

StrongSpoon Feb 7, 2025

StrongSpoon Feb 7, 2025

0x45f Feb 7, 2025

0x45f Feb 7, 2025

awayzjj Feb 8, 2025

awayzjj commented Feb 8, 2025

MSELoss #435

Are you sure you want to change the base?

MSELoss #435

Conversation

awayzjj commented Jan 29, 2025 • edited Loading

PR Category

Type of Change

Description

Issue

Progress

Performance

StrongSpoon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awayzjj commented Feb 8, 2025

awayzjj commented Jan 29, 2025 •

edited

Loading