Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DG operators efficiently #128

Open
eschnett opened this issue Jul 4, 2015 · 3 comments
Open

Support DG operators efficiently #128

eschnett opened this issue Jul 4, 2015 · 3 comments
Assignees

Comments

@eschnett
Copy link
Collaborator

eschnett commented Jul 4, 2015

Kranc currently supports DG operators, but not efficiently. I am planning to add a feature that makes this efficient. The basic plan is to loop over a grid function in two layers: an outermost layer that loops over elements (or "tiles" in general), and an innermost layer that loops over all collocation points within an element.

OpenMP parallelization is applied only to the outermost layer. Vectorization is only relevant for the innermost layer. Derivatives etc. are applied "en bloc" to a whole element.

This should also generalize to other numerical methods such as finite differencing. Replace the term "element" by "tile", and "collocation point" by "grid point" for this. The tile size can be chosen freely and is not restricted to the element size as for DG. This optimization has proven beneficial in Chemora for FD on GPUs, so I assume this would also lead to a performance benefit on CPUs.

I attach a sample for how the generated code (sans Kranc-typical boilerplate) could look like, following the structure described above.

@eschnett
Copy link
Collaborator Author

eschnett commented Jul 4, 2015

@ianhinder
Copy link
Owner

Additionally, I recently tested that using tiling improved performance significantly on Intel MICs, possibly due to giving a larger number of small work units to distribute using OpenMP. Tiling is currently implemented in LoopControl. Why is it better to do tiling in Kranc than in LoopControl or cctk_Loop.h?

@eschnett
Copy link
Collaborator Author

eschnett commented Jul 7, 2015

I want to add other features such as pre-calculating derivatives per tile instead of calculating them per grid point or per grid function. This is not easily possible otherwise. See the attached gist, which shows the structure of the generated code I want to have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants