Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add asserts for each operation in RT and also add check for setting the output #1313

Open
mtopalovicTT opened this issue Nov 18, 2024 · 2 comments · May be fixed by #2338
Open

Add asserts for each operation in RT and also add check for setting the output #1313

mtopalovicTT opened this issue Nov 18, 2024 · 2 comments · May be fixed by #2338
Assignees
Labels
MLIR Ops Issues related to MLIR dialect ops and their implementations

Comments

@mtopalovicTT
Copy link
Contributor

mtopalovicTT commented Nov 18, 2024

We don't validate output/input parameters in RT at all. We should add checks before each op execution which validates input tensor against expected input. We also need to do this when setting the output tensor in RT.

@mtopalovicTT mtopalovicTT added the MLIR Ops Issues related to MLIR dialect ops and their implementations label Nov 18, 2024
@mtopalovicTT mtopalovicTT self-assigned this Nov 18, 2024
@mtopalovicTT mtopalovicTT changed the title Missing assert for output tensor shape in runtime Add asserts for each operation in RT and also add check for setting the output Nov 19, 2024
@mtopalovicTT
Copy link
Contributor Author

@jnie-TT do you have some thoughts on this?

@jnie-TT
Copy link
Contributor

jnie-TT commented Nov 19, 2024

@mtopalovicTT We didn't do it previously because the compiler-generated tensor descriptors were immature (it's much better now, but still has some issue). Some common issues that come to mind include:

  • Wrong data type. Compiler assumes that ops can implicitly typecast a tensor data type (which some ops can, but some ops can't), and therefore there will sometimes be data type mismatches on output tensors
  • Invalid layout. Previously the tile shapes were not correctly updated in the tensors when we force tile/row_major layout. Currently for the decomposed ops we still lack granular layout updates.
  • Some ops auto-shard, auto-move to host (I think we saw this in conv), and we sometimes have workarounds in runtime to pre-shard. Compiler cannot capture this and therefore all downstream layouts will mismatch.
  • These runtime checks have performance hits. If we add these we should add them as debug asserts, and probably update CI to also run runtime in debug build.

Overall though I'm definitely on-board of adding such checks to runtime. Currently there's no way to ensure alignment between runtime and compiler-generated descriptors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MLIR Ops Issues related to MLIR dialect ops and their implementations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants