Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pykernel - matmul common reader/writer kernels #2289

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

vtangTT
Copy link
Contributor

@vtangTT vtangTT commented Feb 26, 2025

Ticket

#2254

More specifically:
#2178
#2247
#2249

Problem description

Want to be able to write reader and writer kernels used in matmul programming examples.

What's changed

InterleavedAddrGenFast cpp struct mapping:

  • struct itself is exposed to python through TTKernelOp get_interleaved_addr_gen_fast
  • conversionpatternrewriter for said Op that will do the following:
%6 = "ttkernel.get_interleaved_addr_gen_fast"(%3, %0, %4, %5) : (i1, i32, i32, !ttkernel.DataFormat) -> !ttkernel.interleaved_addr_gen_fast

-> to emitc

%14 = "emitc.variable"() <{value = #emitc.opaque<"">}> : () -> !emitc.lvalue<!emitc.opaque<"InterleavedAddrGenFast<true>">>
%15 = "emitc.member"(%14) <{member = "bank_base_address"}> : (!emitc.lvalue<!emitc.opaque<"InterleavedAddrGenFast<true>">>) -> !emitc.lvalue<i32>
%16 = "emitc.member"(%14) <{member = "page_size"}> : (!emitc.lvalue<!emitc.opaque<"InterleavedAddrGenFast<true>">>) -> !emitc.lvalue<i32>
%17 = "emitc.member"(%14) <{member = "data_format"}> : (!emitc.lvalue<!emitc.opaque<"InterleavedAddrGenFast<true>">>) -> !emitc.lvalue<!emitc.opaque<"DataFormat">>
emitc.assign %5 : i32 to %15 : <i32>
emitc.assign %12 : i32 to %16 : <i32>
emitc.assign %13 : !emitc.opaque<"DataFormat"> to %17 : <!emitc.opaque<"DataFormat">>
%18 = emitc.load %14 : <!emitc.opaque<"InterleavedAddrGenFast<true>">>

-> to cpp

InterleavedAddrGenFast<true> v15;
v15.bank_base_address = v6;
v15.page_size = v13;
v15.data_format = v14;
InterleavedAddrGenFast<true> v16 = v15;
  • added noc_async_read_tile ttkernel op
  • added noc_async_write_tile ttkernel op

Checklist

  • TTKernelToEmitC unit tests for all added ops

@vtangTT vtangTT changed the title Vtang tt/pykernel new type Pykernel - matmul common reader/writer kernels Feb 26, 2025
@vtangTT vtangTT force-pushed the vtangTT/pykernel_new_type branch from 79998c8 to 295da6a Compare February 26, 2025 21:20
@vtangTT vtangTT marked this pull request as ready for review February 26, 2025 21:32
NocAsyncReadTile
}];

let arguments = (ins I32:$id, TTKernel_InterleavedAddrGenFast:$s, I32:$dstLocalL1Addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s is too short of an identifier to be useful, can we change it to something longer/more self-documenting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to addrGenStruct

@@ -634,6 +652,16 @@ def TTKernel_StoreToL1Op : TTKernel_Op<"store_to_l1"> {
let arguments = (ins I32:$value, TTKernel_L1AddrPtr:$l1_ptr, I32:$offset);
}

def TTKernel_GetInterleavedAddrGenFastOp : TTKernel_Op<"get_interleaved_addr_gen_fast"> {
let summary = "GetAddrGenFastConfig";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary doesn't seem to match GetInterleavedAddrGenFastOp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, ty!

rewriter.create<emitc::LoadOp>(op->getLoc(), opaqueStructType, varOp);

// Replace the original operation with the loaded value so it can be used.
op.replaceAllUsesWith(loadOp.getResult());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think op.replaceAllUsesWith(loadOp.getResult()); is a legal modification when using conversion patterns -- all IR modifications must be done via rewriter (see https://www.youtube.com/watch?v=xIeihq2WZOU). The rewriter.replaceOp... line should be enough -- is that not the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm not sure why I left that in there. It works without, thank you!

NocAsyncWriteTilie
}];

let arguments = (ins I32:$id, TTKernel_InterleavedAddrGenFast:$s, I32:$srcLocalL1Addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to addrGenStruct

@@ -556,6 +624,7 @@ class ConvertTTKernelToEmitCPass
return op.getNumArguments() == 0;
});
target.addLegalOp<func::ReturnOp>();
target.addIllegalOp<ttkernel::GetInterleavedAddrGenFastOp>();
Copy link
Contributor

@vroubtsovTT vroubtsovTT Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be needed here because the next line declares all of ttkernel dialect as illegal at the end of this conversion step

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this was sloppy, ty!

// CHECK: "emitc.member"(%[[VAR]]) <{member = "data_format"}>{{.*}}
// CHECK: emitc.assign {{.*}}
// CHECK: emitc.assign {{.*}}
// CHECK: emitc.assign {{.*}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using {{.*}} at the end of regexes isn't doing anything useful, because CHECK is already a match for a substring of a line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed ty!

// CHECK: = emitc.load %[[VAR]] : <!emitc.opaque<"InterleavedAddrGenFast<true>">>
%is_dram = arith.constant 1 : i1
%temp1 = arith.constant 262400 : i32
%temp2 = arith.constant 32 : i32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asking for two changes here:

  1. since we are using %temp1 and %temp2 in multiple places later and they aren't just dummy inputs, can we give these SSA vars some meaningful names?
  2. can we capture the names of these variables by using %[[CAPITALIZED_SSA_VAR_NAME]] = ... CHECKs and then verify that these variables are passed into emitc.call_opaque "noc_async_write_tile" in the correct arg order instead of just using {{.*}} there? look at other tests in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given them more meaningful names as well as capturing and checking for them in later uses.

"ttkernel.noc_async_write_tile"(%temp2, %s, %temp1) : (i32, !ttkernel.interleaved_addr_gen_fast, i32) -> ()
// CHECK: emitc.call_opaque "noc_async_read_tile"{{.*}}
"ttkernel.noc_async_read_tile"(%temp2, %s, %temp1) : (i32, !ttkernel.interleaved_addr_gen_fast, i32) -> ()
// CHECK-NEXT: return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this CHECK_NEXT anchor doesn't seem to be doing anything useful here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep.. removed

auto lvalueBankBaseAddr = rewriter.create<emitc::MemberOp>(
op->getLoc(),
emitc::LValueType::get(op.getBankBaseAddress().getType()),
"bank_base_address", varOp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct pattern during dialect conversion is to obtain operand types/properties from the adaptor, not the op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 100% agree. I used adaptor for dataformat too, not sure why I was using op here :(

"bank_base_address", varOp);
auto lvaluePageSize = rewriter.create<emitc::MemberOp>(
op->getLoc(), emitc::LValueType::get(op.getPageSize().getType()),
"page_size", varOp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment, shouldn't this reference adaptor, not op?

@vtangTT vtangTT force-pushed the vtangTT/pykernel_new_type branch from 78de1ce to 83c1e10 Compare February 28, 2025 03:25
@vtangTT vtangTT requested a review from vroubtsovTT February 28, 2025 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants