-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dense shared memory workspaces #302
base: gpu-workspaces
Are you sure you want to change the base?
Dense shared memory workspaces #302
Conversation
Wanted to add Changwan as a reviewer, but I don't know his GitHub username. |
Oh, I saw it just now. My Github username is "hochawa". Could you add me? |
For whatever reason, it's not letting me add you as a reviewer, weird. I think you can review it anyway, though. |
This commit adds a function `debugCompileSource` to a `Tensor` that allows for the `Tensor` to use a kernel from a provided source file instead of generating a new one. This allows developers to add prints/assertions to TACO generated code to debug faster. Inspired by Amalee's PR (tensor-compiler#302), I would have found a command like this very useful for debugging generated code.
This commit adds a function `debugCompileSource` to a `Tensor` that allows for the `Tensor` to use a kernel from a provided source file instead of generating a new one. This allows developers to add prints/assertions to TACO generated code to debug faster. Inspired by Amalee's PR (tensor-compiler#302), I would have found a command like this very useful for debugging generated code.
This PR adds support for dense shared memory workspaces on GPUs. There's still a little bit of work to be done for the case where multiple precomputed temporaries are used in one kernel, though none of the tests explicitly check this case. Would be nice to get feedback on the choice to add the GPUWorkspace enums, especially since I'm attaching them to Var (similar to is_ptr and is_ tensor, but the Var is assigned a GPUWorkspace enum). Another important question I have is whether the precomputed temporary (at least, in the GPU case) needs the loop that initializes it to zero.
Also, this PR includes some debugging functions that I set up for myself but thought might be useful to others. After generating a kernel / taco temporary file, I wanted to be able to edit the just-generated file and then compile and run everything it again with my handwritten changes.